CN118135607A

CN118135607A - VSLAM dynamic object eliminating method and detection system for indoor environment

Info

Publication number: CN118135607A
Application number: CN202410169671.4A
Authority: CN
Inventors: 刘云翔; 姚逸夫; 原鑫鑫
Original assignee: Shanghai Institute of Technology
Current assignee: Shanghai Institute of Technology
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-06-04

Abstract

The invention provides a VSLAM dynamic object eliminating method and a detection system for an indoor environment, wherein the dynamic object eliminating method comprises the following steps: confirming a detection target object boundary frame and a class based on YOLOX target detection models, acquiring a human body object boundary frame and an object boundary frame in the detection target object, analyzing the human body object postures in the human body object boundary frame according to a posture detection algorithm, respectively carrying out dynamic consistency abnormal point detection on the human body object boundary frames and the object boundary frames with different postures, carrying out dynamic and static distinction on characteristic points in the object boundary frame through a self-adaptive threshold algorithm, and extracting and storing static characteristic points in the target object boundary frame for follow-up tracking. The invention can realize the efficient and accurate distinction of dynamic and static characteristic points in the operation environment, and can furthest reserve the information of the static characteristic points, thereby improving the accuracy and the stability of the VSLAM system.

Description

VSLAM dynamic object eliminating method and detection system for indoor environment

Technical Field

The invention belongs to the technical field of computer vision detection, and particularly relates to a VSLAM dynamic object eliminating method and a detection system for an indoor environment.

Background

VSLAM (Visual Simultaneous Localization AND MAPPING) is a robotics technology that takes images of the environment with cameras or depth cameras and processes these images using algorithms to determine the current position and pose of the robot relative to the environment, while the robot uses its vision sensors and computer vision algorithms to construct an environment map to achieve autonomous navigation in future navigation tasks. Compared with an object sensor, the visual sensor has lower cost, and the VSLAM technology has high-precision positioning and map construction functions and higher flexibility, so that the visual sensor is widely applied and researched in the field of robots.

However, in the prior art, in visual detection of a dynamic environment, due to movement and variation of an object, data association is unstable, so that stable acquisition of information such as depth is difficult, currently, a mainstream solution is to track only stable static feature points, use semantic segmentation or object detection to obtain a mask or a bounding box defined as a movable object, remove feature points of the object through geometric information, and process a dynamic object through deep learning in combination with a traditional geometric method, but usually include a plurality of static feature points in the bounding box of the dynamic object, if all the static feature points are erased, insufficient data is associated for pose estimation, so that the VSLAM system fails to position or tends to be unstable.

Disclosure of Invention

The invention aims to provide a VSLAM dynamic object eliminating method and a VSLAM dynamic object detecting system for an indoor environment, so as to solve the technical problem that the existing VSLAM technology cannot efficiently and accurately distinguish dynamic and static feature points.

In order to solve the problems, the technical scheme of the invention is as follows: a VSLAM dynamic object eliminating method for indoor environment comprises the following steps: different target object categories are confirmed and target object boundary boxes are framed on input image frame data based on YOLOX target detection models, the target object categories comprise human body objects and object objects, and identification data of detection target objects are input in the YOLOX target detection models in advance;

Judging the posture of a human body object in a human body object boundary frame through a posture detection algorithm, directly detecting dynamic consistency abnormal points when the human body object is in a standing posture state, dividing a high dynamic area and a low dynamic area when the human body object is in a sitting posture state, then detecting the dynamic consistency abnormal points, and marking each human body object boundary frame as a dynamic object boundary frame or a static object boundary frame according to a detection result;

Respectively carrying out dynamic consistency abnormal point detection on each object boundary frame, marking the object boundary frame as the dynamic object boundary frame when the number of dynamic feature points in each object boundary frame is larger than the number threshold value of error dynamic feature points, otherwise marking the object boundary frame as the static object boundary frame, wherein the number threshold value of error dynamic feature points of each object boundary frame is obtained based on a self-adaptive threshold algorithm;

And deleting the dynamic feature points which are simultaneously positioned outside the static object boundary box and inside the dynamic object boundary box, and extracting and storing the static feature points in the target object boundary box for subsequent tracking.

Preferably, the determining the posture of the human body object in the boundary frame of the human body object by using a posture detection algorithm further includes: acquiring length and width data of the human body object boundary frame, and calculating an aspect ratio value of the human body object boundary frame, wherein the aspect ratio calculation formula of the human body object boundary frame is as follows:

Wherein, R _i is the length-width ratio value of the human body object boundary frame, H _i is the length of the human body object boundary frame, and W _i is the width of the human body object boundary frame;

and comparing the aspect ratio value of the human body object boundary frame with a preset gesture judgment threshold value to determine gesture actions of the human body object in the human body object boundary frame.

Preferably, the first high dynamic area and the low dynamic area dividing when the human subject is in a sitting state further include: and acquiring length data of the human body object boundary frame in the sitting posture state, and dividing the length of the human body object boundary frame in the sitting posture state in equal proportion according to a preset sitting posture proportion value to form a high dynamic area and a low dynamic area which take the bottom of a hand movable joint of the human body object as a boundary.

Preferably, the posture judgment threshold value is set to be 3 according to the head-body proportion of the human body subject in different postures; setting the sitting posture proportion value of the human body object boundary frame to be 3:1 according to the positions of the hand movable joints in the sitting posture state of the human body object.

Preferably, the detecting the abnormal points of dynamic consistency for the boundary boxes of different object objects further includes: and tracking and obtaining the number of the feature points in the object boundary boxes through an optical flow pyramid algorithm, and multiplying the number of the feature points by a preset dynamic judgment threshold proportion to obtain an error dynamic feature point number threshold in the object boundary boxes.

Preferably, the detecting the abnormal points of dynamic consistency for the boundary boxes of different object objects further includes: calculating the distance between the characteristic point and the epipolar line in the boundary frame of the object through the basic matrix, wherein the calculation formula of the distance between the characteristic point and the epipolar line is as follows:

Wherein d is a distance value between a feature point and an epipolar line, F is a basic matrix, P _i and P _i+1 are matching points in the previous frame, and X and Y are corresponding epipolar line magnitudes;

And comparing the distance value between the characteristic point and the polar line with a preset dynamic characteristic point judgment threshold value, and recording the characteristic point as a dynamic characteristic point if the distance value between the characteristic point and the polar line is larger than the dynamic characteristic point judgment threshold value.

Preferably, the detecting the abnormal points of dynamic consistency for the boundary boxes of different object objects further includes: and comparing the number of the dynamic feature points in the object boundary frame with the threshold value of the number of the error dynamic feature points, and marking the object boundary frame as the dynamic object boundary frame if the number of the dynamic feature points in the object boundary frame is larger than the threshold value of the number of the error dynamic feature points.

Preferably, extracting and saving the static feature points in the detection target object further includes: traversing a static object boundary frame, recording static object pixel coordinates corresponding to the static object boundary frame, and marking static marker bit of feature points in the static object boundary frame outside the dynamic human body boundary frame.

Preferably, extracting and saving the static feature points in the detection target object further includes: and traversing the dynamic human body object boundary frame, firstly carrying out dynamic zone bit marking on all feature points in the dynamic human body object boundary frame, judging whether the static object pixel coordinates are positioned in the dynamic human body object boundary frame, and carrying out static zone bit marking on the feature points corresponding to the static object pixel coordinates positioned in the dynamic human body object boundary frame again.

Based on the same conception, the invention also provides a VSLAM dynamic object detection system for indoor environment, comprising:

The image recognition module is used for recognizing the category of the target object in the image frame data and selecting a boundary frame of the target object in a frame mode;

The human body object gesture analysis module is used for analyzing the gesture of the human body object in the human body object boundary frame and dividing a high-dynamic area and a low-dynamic area in the human body object boundary frame;

the dynamic consistency abnormal point detection module is used for acquiring dynamic and static characteristic points in the human body object boundary frame and the object boundary frame;

the object threshold judging module is used for further judging the dynamic and static properties of the object boundary frame;

and the zone bit marking module is used for respectively marking the dynamic and static zone bit on the dynamic and static feature points in the dynamic and static object boundary frame.

By adopting the technical scheme, the invention has the following advantages and positive effects compared with the prior art:

(1) According to the VSLAM dynamic object eliminating method and the VSLAM dynamic object detecting system for the indoor environment, provided by the invention, human body objects and different types of object objects are efficiently distinguished and identified based on YOLOX target detection models, aiming at the human body objects, accurate analysis of the posture of the human body objects is realized by adopting a posture detection algorithm, dynamic feature point detection is respectively carried out on different human body structural areas in standing posture and sitting postures including sitting postures by adopting a motion consistency detection mode, human body object boundary frames are subdivided, static feature points of the human body objects are reserved to the maximum extent, dynamic feature point screening precision is improved, and stability and high efficiency of positioning and tracking of a VSLAM system are ensured.

(2) According to the VSLAM dynamic object eliminating method and the VSLAM dynamic object detecting system for the indoor environment, dynamic characteristic points and static characteristic points are distinguished by adopting a motion consistency detecting algorithm aiming at object objects, the object dynamics is judged according to the product of the number of the characteristic points in an object area and the dynamic judging threshold value proportion by adopting a self-adaptive threshold value algorithm, the problem that when an object is shielded or a camera moves to a certain extent, the object is mistakenly regarded as a dynamic object is solved, the influence of noise and the movement of the camera on visual detection results is effectively eliminated, meanwhile, the VSLAM system for different types of objects has better robustness, and the positioning tracking accuracy of a robot on the object objects is guaranteed.

(3) In the VSLAM dynamic object eliminating method and the detection system for the indoor environment, for the feature point processing in the dynamic human body object boundary frame, the static feature points in the dynamic human body object boundary frame are secondarily detected through the static object pixel coordinates, the static feature points in the detection target object are stored as far as possible, enough data association is provided for posture estimation, and the positioning accuracy and stability of the VSLAM system are improved.

Drawings

FIG. 1 is a flow chart of a method for eliminating VSLAM dynamic objects in indoor environment;

FIG. 2 is a flow chart of an attitude detection algorithm and an adaptive threshold algorithm provided by the invention;

FIG. 3 is a schematic diagram of the posture proportion of a human body object provided by the invention;

FIG. 4 shows a visual inspection effect under TUM data set provided by the present invention;

FIG. 5 shows a visual inspection effect display b under the TUM data set provided by the present invention;

FIG. 6 is a graph comparing trace error results under a w/half dataset provided by the invention;

FIG. 7 is a graph comparing trace error results under w/rpy datasets provided by the present invention;

FIG. 8 is a graph comparing trace error results under a w/static dataset provided by the present invention;

FIG. 9 is a graph comparing trace error results under w/xyz data set provided by the present invention;

FIG. 10 is a graph comparing trace error results under an s/half dataset provided by the invention;

FIG. 11 is a graph comparing trace error results under w/xyz data set provided by the present invention.

Detailed Description

The invention provides a VSLAM dynamic object eliminating method and a detection system for an indoor environment, which are further described in detail below with reference to the accompanying drawings and the specific embodiments. Advantages and features of the invention will become more apparent from the following description and from the claims.

First embodiment

Referring to fig. 1 to 5, the VSLAM dynamic object rejection method for an indoor environment provided in this embodiment includes the following steps:

and capturing and identifying target objects in the input image frame data by using YOLOX target detection models, selecting target object boundary boxes, and confirming and detecting target object categories to which different target object boundary boxes belong, wherein the target objects are divided into human body objects and object objects, and the object objects comprise a plurality of different types of objects.

And acquiring a human body object boundary frame in the detection target object, and analyzing the human body posture in the human body object boundary frame by utilizing a posture detection algorithm, wherein the human body posture analysis result comprises two conditions of standing posture and sitting posture.

If the human body gesture analysis result is the standing gesture, acquiring a human body object boundary box in the standing gesture state, and directly detecting dynamic characteristic points by a dynamic consistency abnormal point detection method. If the human body gesture analysis result is a sitting gesture, acquiring a human body object boundary box in the sitting gesture state, dividing the human body object boundary box into a high dynamic region and a low dynamic region, and then detecting dynamic characteristic points of the high dynamic region and the low dynamic region respectively through a dynamic consistency abnormal point detection method.

And aiming at object boundary frames of different types, respectively carrying out dynamic feature point detection by a dynamic consistency abnormal point detection method, obtaining an error dynamic feature point number threshold value of each object boundary frame based on a self-adaptive threshold algorithm, comparing the number of dynamic feature points in each object boundary frame with the error dynamic feature point number threshold value, and marking the object boundary frame as a dynamic object boundary frame if the number of the dynamic feature points is larger than the error dynamic feature point number threshold value, otherwise marking the object boundary frame as a static object boundary frame.

And finally, performing secondary detection on the human body object boundary frame and the object boundary frame, deleting dynamic characteristic points which are simultaneously positioned outside the static object boundary frame and inside the dynamic object boundary frame, and extracting and storing the static characteristic points in the target object boundary frame for subsequent tracking.

According to the VSLAM dynamic object eliminating method for the indoor environment, after the target object is identified based on the YOLOX target detection model, dynamic feature points are confirmed through the motion consistency abnormal point detection method, the gesture of the human object is analyzed by using the gesture detection algorithm aiming at the human object, the regions in the boundary frames of the human object under different gestures are subdivided, the number of feature points of different types of objects is independently evaluated by using the self-adaptive threshold algorithm aiming at the object, and finally, accurate elimination of the dynamic feature points is achieved through secondary detection of the static feature points of the object in the boundary frames of the dynamic human object, all the static object feature points are reserved as far as possible, and the accuracy and stability of positioning and tracking of the VSLAM system are effectively improved.

Preferably, for a human body object, because the human body motion mode and the motion amplitude of the human body are obviously different in the sitting posture state and the standing posture state, in order to improve the recognition precision and the recognition efficiency, the embodiment provides a human body posture detection algorithm for processing a human body object boundary frame and distinguishing the sitting posture and the standing posture of the human body. Firstly, acquiring length and width data of a human body object boundary frame, and calculating the length-width ratio value of the human body object boundary frame, wherein the length-width ratio calculation formula of the human body object boundary frame is as follows:

Wherein, R _i is the length-width ratio value of the boundary frame of the human body object, H _i is the length of the boundary frame of the human body object, and W _i is the width of the boundary frame of the human body object.

For setting the gesture determination threshold under the standing gesture and sitting gesture conditions, the present embodiment uses the concept of the head-body ratio of the person in the painting field to perform calculation and confirmation, and a certain proportional relationship exists between the head length and width of the person and the contour of the human body, for example, in the sitting gesture state, the width of the human body is typically the diameter width of two heads, the height of the human body is typically the diameter width of five heads, and the aspect ratio of the human body is about 5/2. Accordingly, in the standing position, the width of the human body is unchanged, but the length of the human body is usually seven, eight and nine, namely the aspect ratios of the human body in the standing position are 9/2, 8/2 and 7/2 respectively. From statistics of a large amount of data, it is known that the aspect ratio of the human body object bounding box in the sitting state is generally less than 3, and the aspect ratio of the human body object bounding box in the standing state is generally 4 or more, that is, greater than 3. Therefore, in the present embodiment, a human body posture of a human body object bounding box having an aspect ratio of 3 or less is defined as a sitting posture, and a human body posture having an aspect ratio of 3 or more is defined as a standing posture, that is, a posture determination threshold value is set to 3 in advance.

And comparing the aspect ratio value of the boundary frame of the human body object with a preset gesture judgment threshold value, so as to confirm the gesture action of the human body object in the human body object area.

Further, the human body object posture action is obtained by comparing the aspect ratio value of the human body object boundary frame with a preset posture judging threshold value, and for the sitting posture state, as the human body structure movement trend is mainly concentrated in the upper half body area, especially the upper half body movable joints are used as main movement parts, after the posture detection algorithm confirms that the human body object posture action is sitting posture, the human body object sitting posture state boundary frame is vertically divided into a high-dynamic area and a low-dynamic area by taking the bottom of the movable joints of the hands as a boundary line, and a large amount of test data shows that the division ratio is 3/4 of the length of the human body object boundary frame in the sitting posture state, namely the high-dynamic area is the upper half 3/4 area of the human body object boundary frame in the sitting posture state, the low-dynamic area is the lower half 1/4 area of the human body object boundary frame in the sitting posture state, and the whole arm part is positioned in the high-dynamic area. The high dynamic region and low dynamic region bounding box information formulas are as follows:

After the boundary frame information of the high dynamic area and the low dynamic area in the sitting posture state is obtained based on the distinction, the high dynamic area and the low dynamic area are respectively subjected to dynamic consistency abnormal point detection, dynamic abnormal points in the high dynamic area and the low dynamic area are extracted, and the human body object boundary frame is refined and segmented by utilizing gesture analysis, so that under the condition that the motion amplitude of each part of the human body object has a large difference, the human body object boundary frame is refined, classified and evaluated, static characteristic point information is reserved to the greatest extent, and the dynamic characteristic point rejection precision is improved.

Preferably, the number of the feature points of different object objects is greatly different due to differences of volumes, moving modes and the like, and meanwhile, the feature points of different object objects and a dynamic judgment threshold value cannot be roughly distinguished because the movement of the human body object can cause shielding misjudgment influence on the grabbing of the feature points of the object objects. Therefore, in this embodiment, an adaptive threshold algorithm is provided, the number of feature points in a single object boundary box is tracked and obtained through an optical flow pyramid algorithm, and the number of feature points is multiplied by a preset dynamic judgment threshold proportion to obtain a threshold value of the maximum error dynamic feature point number allowed to exist in the object boundary box.

Further, the distance between the characteristic point and the polar line is calculated through the basic matrix, and the calculation formula of the distance between the characteristic point and the polar line is as follows:

Wherein d is the feature point and epipolar distance value, F is the base matrix, P _i and P _i+1 are the matching points in the previous frame, and X and Y are the corresponding epipolar magnitudes.

And comparing the distance value between the characteristic point of the object of the type and the polar line with an abnormal point threshold value preset by the object of the type, and judging the characteristic point as a dynamic characteristic point if the distance value between the characteristic point and the polar line is larger than the abnormal point threshold value.

Further, comparing the number of the dynamic feature points in the object boundary frame with the threshold value of the number of the error dynamic feature points of the object boundary frame of the type, and if the number of the dynamic feature points in the object boundary frame of the type is larger than the threshold value of the number of the error dynamic feature points of the object boundary frame of the type, judging the object boundary frame as the dynamic object boundary frame. Through the self-adaptive threshold algorithm, the VSLAM system provided by the embodiment has better robustness for different types of object objects, and meanwhile, the problem that the object is mistaken for a dynamic object when the object is blocked or the camera moves to a certain degree is solved. The dynamic judgment threshold is generally defined as the number of 20% of characteristic points in the object area, and by setting the proportion of the dynamic judgment threshold to be relatively low, the detection precision can be further improved, the serious influence on the positioning and tracking of the robot caused by the false negative result of detection (the dynamic object is mismarked as a static object) is reduced, and the stability and the high efficiency of the system are ensured.

The above-mentioned human object bounding box gesture analysis and high dynamic region and low dynamic region distinction, and the logic algorithm of the adaptive threshold algorithm application part are further illustrated by the following pseudo-code, where σ is the dynamic decision threshold ratio,For human body object bounding box height,/>For human body object bounding box width,/>For object bounding box height,/>Is the object bounding box width.

Preferably, after the gesture analysis of the human body object boundary frame and the self-adaptive threshold algorithm of the object boundary frame are judged, two types of data of the dynamic object boundary frame and the static object boundary frame of the human body object and the object are obtained, and because of the ambiguity of the object boundary frame, only redundant data which are simultaneously positioned outside the static object boundary frame and inside the dynamic object boundary frame are deleted, and key data which are intersected with the dynamic object boundary frame in the static object boundary frame are reserved for further screening. Firstly traversing all feature points except redundant data in a current frame, setting a flag bit for all feature points in a dynamic human body object boundary frame aiming at the dynamic human body object boundary frame, wherein 0 in the flag bit represents a static feature point, and 1 represents a dynamic feature point, namely firstly marking all feature points in the dynamic human body object boundary frame as dynamic feature points. And traversing the static object boundary frame, recording static object pixel coordinates corresponding to the feature points in the static object boundary frame, judging whether the static object pixel coordinates are positioned in the dynamic human body boundary frame, and if the static object pixel coordinates are positioned in the dynamic human body boundary frame, marking the feature points corresponding to the static object pixel coordinates in the dynamic human body boundary frame with the static zone bit again.

The position determination formula of the static object pixel coordinates and the dynamic human body object boundary box is as follows:

Similarly, for the object boundary frame, the self-adaptive threshold algorithm is used for judging and obtaining the dynamic object boundary frame and the static object boundary frame, the characteristic points in the dynamic object boundary frame outside the dynamic human object boundary frame are marked as dynamic characteristic points, and the characteristic points in the static object boundary frame outside the dynamic human object boundary frame are marked as static characteristic points.

And finally, adding the static human body object boundary frames and the static characteristic points in the static object boundary frames into the set S _n to be used as source data for tracking the follow-up robot.

The static flag bit markers in the dynamic human body object bounding box and the static flag bit marker logic algorithm in the static object bounding box are further illustrated by the following pseudo code.

Preferably, in the VSLAM dynamic object removing method for an indoor environment provided in this embodiment, it is assumed that all target objects have the possibility of moving in a complex detection scene, so in order to identify as many object types as possible and improve the target object identification efficiency, 80 kinds of target object identification data are pre-recorded in the YOLOX target detection model, that is, the pre-training is performed on the YOLOX target detection model on the COCO data set, and the data set includes 80 different kinds of objects, mainly including common objects such as a person, a chair, a keyboard, a display screen, a mouse, and an automobile, and the detection accuracy and efficiency of this embodiment are improved and the subsequent semantic map construction is facilitated by performing the target object identification training on the YOLOX target detection model. Meanwhile, compared with the your new YOLO series, the YOLOX target detection model adopted in the embodiment has higher calculation speed while maintaining excellent accuracy, and in order to further improve the processing speed, the network structure of the YOLOX model is optimized by adopting the GPU acceleration method TensorRT.

According to the VSLAM dynamic object eliminating method for the indoor environment, provided by the embodiment, the target object identification efficiency and accuracy are greatly optimized in the process of constructing an environment map by the VSLAM system, and meanwhile, static object feature points are secondarily detected in a dynamic human object boundary box through a gesture detection algorithm and a self-adaptive threshold algorithm, so that the static feature points are reserved to the maximum extent, and the detection accuracy and stability of the VSLAM system are greatly improved.

Referring to fig. 6 to 11, in an experimental test stage, an application test is performed on a VSLAM dynamic object rejection method for an indoor environment provided in this embodiment, and experimental data is selected to use a TUM RGB-D data set, which may include an image sequence captured by an RGB-D camera in a dynamic environment, and accurate ground real trajectories and camera parameters. In the experimental parameter setting, four groups of high-dynamic sequence data sets related to walking are selected, including walking and changing the data sets of the object position, and meanwhile, two groups of low-dynamic sitting posture sequences are added in the experiment and used for testing the performance of the system in the low-dynamic environment, wherein the low-dynamic sitting posture sequences refer to small-amplitude movements of human objects under the sitting posture condition. The six sets of image sequence data are respectively identified by w/half, w/rpy, w/static, w/xyz, s/half and s/xyz, wherein w and s respectively represent walking sequences and sitting sequences, half, rpy, static and xyz respectively represent different movement modes of the camera, and experimental results measure the overall robustness and stability of the method of the embodiment through quantitative data of RMSE (root mean square error) of ATE (absolute track error).

The test results of this example are compared with the original ORB-SLAM3 and DSLAM results as follows:

As can be seen from comparison of the test results of this embodiment with the original ORB-SLAM3 and DSLAM results, in the test of w/half image sequence data, the error of this embodiment method is lower than ORB-SLAM3 by 0.175 and lower than DS-SLAM by 0.007; in the test of w/rpy image sequence data, the error of the method of the embodiment is lower than ORB-SLAM3 by 0.083 and lower than DS-SLAM by 0.092; in the test of w/static image sequence data, the error of the method of the embodiment is lower than ORB-SLAM3 by 0.016 and lower than DS-SLAM by 0.078; in the test of w/xyz image sequence data, the error of the method of the embodiment is lower than ORB-SLAM3 by 0.22 and lower than DS-SLAM by 0.006; in the test of s/half image sequence data, the error of the method in the embodiment is lower than that of ORB-SLAM3 by 0.006 (DS-SLAM original paper has no experimental result); in the test of s/xyz image sequence data, the error of the method of the embodiment is lower than that of ORB-SLAM3 by 0.0061 (the DS-SLAM original paper has no experimental result).

Compared with the traditional VSLAM and semantic VSLAM, the VSLAM dynamic object eliminating method for the indoor environment is excellent in effect performance, stable and efficient in visual detection, small in error, capable of effectively eliminating real dynamic feature points in different detection environments, preserving static feature points and improving pose estimation and positioning tracking performance of a robot.

Second embodiment

Based on the same conception, the embodiment provides a VSLAM dynamic object detection system for an indoor environment, which comprises an image recognition module, a human body object gesture analysis module, a dynamic consistency abnormal point detection module, an object threshold judgment module and a marker bit marking module.

The image recognition module is used for recognizing the category of the target object in the image frame data and framing the boundary box of the target object.

The human body object gesture analysis module is used for analyzing the gesture of the human body object in the human body object boundary frame and dividing a high-dynamic area and a low-dynamic area in the human body object boundary frame in a sitting posture state based on a gesture detection algorithm.

The dynamic consistency abnormal point detection module is used for acquiring the human body object boundary frame and dynamic and static characteristic points in the object boundary frame.

The object threshold judgment module is used for judging the dynamic and static properties of the object boundary frame based on an adaptive threshold algorithm.

The zone bit marking module is used for respectively marking the dynamic and static zone bit on the dynamic and static feature points in the dynamic and static object boundary frame.

According to the VSLAM dynamic object detection system for the indoor environment, high-efficiency and accurate detection of dynamic and static feature points in a human body object boundary frame and an object boundary frame is achieved through cooperation operation among modules and assistance of a gesture detection algorithm and a self-adaptive threshold algorithm.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments. Even if various changes are made to the present invention, it is within the scope of the appended claims and their equivalents to fall within the scope of the invention.

Claims

1. A VSLAM dynamic object eliminating method for indoor environment is characterized by comprising the following steps:

different target object categories are confirmed and target object boundary boxes are framed on input image frame data based on YOLOX target detection models, the target object categories comprise human body objects and object objects, and identification data of detection target objects are input in the YOLOX target detection models in advance;

And deleting the dynamic feature points which are simultaneously positioned outside the static object boundary box and inside the dynamic object boundary box, and extracting and storing the static feature points in the target object boundary box to follow-up tracking.

2. The VSLAM dynamic object elimination method for an indoor environment of claim 1, wherein the determining of the human object pose in the human object bounding box by the pose detection algorithm further comprises:

Acquiring length and width data of the human body object boundary frame, and calculating an aspect ratio value of the human body object boundary frame, wherein the aspect ratio calculation formula of the human body object boundary frame is as follows:

3. The VSLAM dynamic object elimination method for an indoor environment of claim 1, wherein the first high dynamic area and low dynamic area division when the human object is in a sitting state further comprises:

And acquiring length data of the human body object boundary frame in the sitting posture state, and dividing the length of the human body object boundary frame in the sitting posture state in equal proportion according to a preset sitting posture proportion value to form a high dynamic area and a low dynamic area which take the bottom of a hand movable joint of the human body object as a boundary.

4. The VSLAM dynamic object elimination method for an indoor environment of claim 2 or claim 3, wherein the posture determination threshold is set to 3 according to the head-body ratio of the human body object in different postures; setting the sitting posture proportion value of the human body object boundary frame to be 3:1 according to the positions of the hand movable joints in the sitting posture state of the human body object.

5. The VSLAM dynamic object elimination method for indoor environment of claim 1, wherein respectively performing dynamic consistency outlier detection on different object bounding boxes further comprises:

And tracking and obtaining the number of the feature points in the object boundary boxes through an optical flow pyramid algorithm, and multiplying the number of the feature points by a preset dynamic judgment threshold proportion to obtain an error dynamic feature point number threshold in the object boundary boxes.

6. The VSLAM dynamic object elimination method for indoor environment of claim 5, wherein respectively performing dynamic consistency outlier detection on different object bounding boxes further comprises:

calculating the distance between the characteristic point and the epipolar line in the boundary frame of the object through the basic matrix, wherein the calculation formula of the distance between the characteristic point and the epipolar line is as follows:

7. The VSLAM dynamic object elimination method for indoor environment of claim 6, wherein respectively performing dynamic consistency outlier detection on different object bounding boxes further comprises:

And comparing the number of the dynamic feature points in the object boundary frame with the threshold value of the number of the error dynamic feature points, and marking the object boundary frame as the dynamic object boundary frame if the number of the dynamic feature points in the object boundary frame is larger than the threshold value of the number of the error dynamic feature points.

8. The VSLAM dynamic object elimination method for an indoor environment of claim 1, wherein extracting and saving the static feature points in the detection target object further comprises:

Traversing a static object boundary frame, recording static object pixel coordinates corresponding to the static object boundary frame, and marking static marker bit of feature points in the static object boundary frame outside the dynamic human body boundary frame.

9. The VSLAM dynamic object elimination method for indoor environment of claim 8, wherein extracting and saving the static feature points in the detection target object further comprises:

and traversing the dynamic human body object boundary frame, firstly carrying out dynamic zone bit marking on all feature points in the dynamic human body object boundary frame, judging whether the static object pixel coordinates are positioned in the dynamic human body object boundary frame, and carrying out static zone bit marking on the feature points corresponding to the static object pixel coordinates positioned in the dynamic human body object boundary frame again.

10. A VSLAM dynamic object detection system for an indoor environment, comprising:

The dynamic consistency abnormal point detection module is used for acquiring dynamic and static characteristic points in the boundary frame of the human body object and the boundary frame of the object;