CN111724439A - Visual positioning method and device in dynamic scene - Google Patents

Visual positioning method and device in dynamic scene Download PDF

Info

Publication number
CN111724439A
CN111724439A CN201911200881.0A CN201911200881A CN111724439A CN 111724439 A CN111724439 A CN 111724439A CN 201911200881 A CN201911200881 A CN 201911200881A CN 111724439 A CN111724439 A CN 111724439A
Authority
CN
China
Prior art keywords
frame image
current frame
motion mask
motion
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911200881.0A
Other languages
Chinese (zh)
Other versions
CN111724439B (en
Inventor
姜昊辰
张晓林
李嘉茂
刘衍青
朱冬晨
彭镜铨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN201911200881.0A priority Critical patent/CN111724439B/en
Publication of CN111724439A publication Critical patent/CN111724439A/en
Application granted granted Critical
Publication of CN111724439B publication Critical patent/CN111724439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of robot navigation positioning, in particular to a visual positioning method and a visual positioning device in a dynamic scene, wherein the method comprises the following steps: acquiring a current frame image, and extracting feature points of the current frame image; inputting the current frame image into a preset deep learning network for semantic segmentation to obtain a target semantic image; determining a motion mask area of the current frame image according to the target semantic image; acquiring depth information of the current frame image; performing motion consistency detection based on the target semantic image and the depth information, and determining a static feature point set of the current frame image; and determining the current state pose information according to the static feature point set. The method and the device perform motion consistency detection through semantic segmentation results and depth information, determine the static characteristic point set of the image, and can effectively improve the accuracy and robustness of pose estimation in a dynamic environment.

Description

Visual positioning method and device in dynamic scene
Technical Field
The invention relates to the technical field of robot navigation positioning, in particular to a visual positioning method and device in a dynamic scene.
Background
With the development of artificial intelligence technology, more and more intelligent mobile robots appear in various scenes in production and life. From an industrial robot to a housekeeping service robot, and from an unmanned aerial vehicle to an underwater detection robot, an important condition of intellectualization is that the robot can move autonomously, namely, the autonomous navigation of the robot is realized. In order to realize autonomous movement in various environments, two basic problems need to be solved, namely positioning and Mapping, and the core of the intelligent mobile robot is a Simultaneous positioning and Mapping (SLAM) technology.
SLAM technology can be largely classified into laser SLAM and visual SLAM depending on the type of sensor. Visual SLAM technology has been extensively studied in recent years due to the richness of images in information storage, and the service type of images for some higher level of work (such as semantic segmentation and object detection). The existing visual SLAM technology is usually a complete framework, comprises parts such as feature extraction, loopback detection and the like, and has obtained better test results under certain environments. However, the existing visual SLAM technology based on point features is based on static environment assumption, and for most actual scenes, an absolute static scene does not exist, so that the accuracy of pose estimation is sharply reduced or even the technology cannot work when the technology is located in a dynamic environment. Meanwhile, since the moving object is not judged, the appearance of artifacts can be caused when the dense point cloud map is reconstructed, so that the environment is wrongly perceived.
Disclosure of Invention
In view of the foregoing problems in the prior art, an object of the present invention is to provide a visual positioning method and apparatus in a dynamic scene, which can improve accuracy and robustness of pose estimation in a dynamic environment.
In order to solve the above problem, the present invention provides a visual positioning method in a dynamic scene, including:
acquiring a current frame image, and extracting feature points of the current frame image;
inputting the current frame image into a preset deep learning network for semantic segmentation to obtain a target semantic image;
determining a motion mask area of the current frame image according to the target semantic image;
acquiring depth information of the current frame image;
performing motion consistency detection based on the target semantic image and the depth information, and determining a static feature point set of the current frame image;
and determining the current state pose information according to the static feature point set.
Further, the performing motion consistency detection based on the target semantic image and the depth information, and determining the set of static feature points of the current frame image includes:
determining a background area of the current frame image according to the target semantic image;
determining first attitude information according to the characteristic points of the background area of the current frame image;
determining types of feature points of the motion mask region based on the first pose information and the depth information, the types including dynamic feature points and static feature points;
removing dynamic characteristic points in the characteristic points of the motion mask area and reserving static characteristic points;
and generating a static characteristic point set according to the static characteristic points and the characteristic points of the background area.
Further, the determining the type of feature points of the motion mask region based on the first pose information and the depth information comprises:
calculating a motion score of a feature point of the motion mask region according to the first pose information and the depth information;
when the motion score is smaller than a preset threshold value, judging that the feature point is a static feature point;
and when the motion score is greater than or equal to a preset threshold value, judging the characteristic point as a dynamic characteristic point.
Specifically, the calculating a motion score of a feature point of the motion mask region according to the first pose information and the depth information includes:
acquiring feature points of a motion mask area of a first reference frame image, and matching the feature points of the motion mask area of the first reference frame image with the feature points of the motion mask area of the current frame image to obtain matching point pairs; wherein, the first reference frame image is a previous frame image of the current frame image;
screening the matching point pairs, and removing the matching point pairs which are mismatched;
and calculating the distance between the screened matching point pairs according to the first position information and the depth information, and taking the distance as the motion score of the characteristic points of the motion mask area.
Further, after determining the motion mask region of the current frame image according to the target semantic image, the method further includes:
acquiring a first motion mask area of a first reference frame image and a second motion mask area of a second reference frame image; the first reference frame image is a frame image before the current frame image, and the second reference frame image is a frame image before the first reference frame image;
judging whether the current frame image has missing detection according to the first motion mask area and the motion mask area of the current frame image;
if the current frame image has missing detection, determining a first target motion mask area according to the first motion mask area and the second motion mask area;
and replacing the motion mask area of the current frame image with the first target motion mask area.
Preferably, the method further comprises:
if the current frame image has no missing detection, determining a second target motion mask area according to the first motion mask area and the motion mask area of the current frame image;
and replacing the motion mask area of the current frame image with the second target motion mask area.
Further, after the obtaining the depth information of the current frame image, the method further includes:
and repairing the depth information by using a preset morphological method.
Further, the determining the current state pose information according to the set of static feature points includes:
when a new key frame is generated, establishing data association among the feature points in the static feature point set, the key frame and the map point;
determining second attitude information according to the feature points in the static feature point set;
and performing pose optimization according to the second pose information and the data association to determine the pose information of the current state.
Further, the method further comprises:
generating static object point cloud data based on the target semantic image, the current state pose information and the static feature point set;
and performing dense reconstruction of the static object point cloud map according to the static object point cloud data.
Another aspect of the present invention provides a visual positioning apparatus in a dynamic scene, including:
the first acquisition module is used for acquiring a current frame image and extracting the characteristic points of the current frame image;
the semantic segmentation module is used for inputting the current frame image into a preset deep learning network for semantic segmentation to obtain a target semantic image;
the determining module is used for determining a motion mask area of the current frame image according to the target semantic image;
the second acquisition module is used for acquiring the depth information of the current frame image;
the detection module is used for carrying out motion consistency detection based on the target semantic image and the depth information and determining a static characteristic point set of the current frame image;
and the positioning module is used for determining the current state pose information according to the static characteristic point set.
Due to the technical scheme, the invention has the following beneficial effects:
according to the visual positioning method under the dynamic scene, the image is subjected to semantic segmentation, the motion consistency is detected according to the semantic segmentation result and the depth information, the dynamic feature points of a motion mask area are removed, the static feature points in the motion mask are added into a static feature point set, the pose optimization is realized by utilizing the static feature point set, and a dense static object point cloud map is reconstructed. The accuracy and robustness of pose estimation in a dynamic environment can be improved, and the accuracy of the static object point cloud map is improved, so that the accuracy of environment perception is improved.
In addition, the visual positioning method under the dynamic scene is based on the assumption of motion continuity, the semantic segmentation results of the adjacent frame images are fused, the condition of missing segmentation is compensated, and the accuracy of the method can be further improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description of the embodiment or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram of a visual positioning system in a dynamic scene according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for visual localization in a dynamic scene according to an embodiment of the present invention;
FIG. 3 is a flow chart of a visual positioning method in a dynamic scene according to another embodiment of the present invention;
FIG. 4 is a flow chart of a visual positioning method in a dynamic scene according to another embodiment of the present invention;
FIG. 5 is a flow chart of a visual positioning method in a dynamic scene according to another embodiment of the present invention;
FIG. 6A is a diagram illustrating the test results of the ORB-SLAM2 system according to one embodiment of the present invention;
FIG. 6B is a diagram illustrating test results of a DS-SLAM system according to an embodiment of the present invention;
FIG. 6C is a diagram illustrating test results of a visual positioning method in a dynamic scene according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a visual positioning apparatus in a dynamic scene according to another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
In order to make the objects, technical solutions and advantages disclosed in the embodiments of the present invention more clearly apparent, the embodiments of the present invention are described in further detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the embodiments of the invention and are not intended to limit the embodiments of the invention. First, the embodiments of the present invention explain the following concepts:
ORB-SLAM2 System: an open source SLAM system for monocular, binocular and RGBD cameras. The ORB-SLAM system is a real-time monocular SLAM system based on feature points and can operate in large-scale, small-scale, indoor and outdoor environments. The system is also robust to strenuous exercise, supporting wide baseline closed loop detection and relocation, including full automatic initialization. The ORB-SLAM2 system also supports calibrated binocular cameras and RGBD cameras on the basis of the ORB-SLAM system.
RGBD: red Green Blue Deep, three-channel color picture and depth information.
DS-SLAM System: a semantic visual SLAM system for dynamic scenes. The DS-SLAM system is proposed based on the ORB-SLAM2 system, and combines a semantic segmentation network with a mobile consistency check method, so that the influence of dynamic objects is reduced, and the positioning precision is greatly improved in a dynamic environment.
Referring to the specification and fig. 1, the embodiment provides a visual positioning system in a dynamic scene, which may include a semantic segmentation and compensation thread 110, a tracking estimation thread 120, a local map thread 130, a loop detection thread 140, and a dense point cloud mapping thread 150.
The semantic segmentation and compensation thread 110 outputs pixel-by-pixel semantic classification results for the input picture by a neural network method such as semantic segmentation, and based on a motion continuity assumption, semantic segmentation results for adjacent frames are fused to compensate for missing segmentation.
The tracking estimation thread 120 calculates the relative motion relationship of the static background by extracting the feature points on the current frame image, collecting semantic information, then calculates the feature point matching relationship of the motion mask region, removes the feature points which do not satisfy the matching relationship, judges the dynamic and static conditions of the feature points by combining the relative motion relationship in a motion consistency detection and depth map weighting mode, eliminates the dynamic feature points, and updates the static feature point set.
The local map thread 130 performs local pose adjustment based on a common view by applying a local sliding window manner to a plurality of key frames and feature points, and is a secondary optimization process for the pose on the basis of the tracking estimation thread 120.
The loop detection thread 140 compares the condition of each frame with all the key frames, and performs global optimization and screening on all the key frames once when finding similar conditions.
The dense point cloud mapping thread 150 removes the dynamic object region through the operations of the last threads, and uses methods such as ICP to splice point clouds according to the estimated relative pose with stable robustness, so as to obtain a dense point cloud map of a static object.
Referring to the specification and fig. 2, the present embodiment provides a visual positioning method in a dynamic scene, which may include the following steps:
s210: and acquiring a current frame image, and extracting the characteristic points of the current frame image.
In the embodiment of the invention, the characteristic points of the current frame image can be extracted by a tracking estimation thread by using an ORB characteristic extraction method, and a characteristic description method of an ORB descriptor is adopted. In some possible embodiments, other methods of characterizing features may be used, as the invention is not limited in this respect.
S220: and inputting the current frame image into a preset deep learning network for semantic segmentation to obtain a target semantic image.
In the embodiment of the invention, the semantic segmentation can be carried out on the current frame image through a semantic segmentation and compensation thread. The preset deep learning network can comprise an ENet semantic segmentation network, wherein the ENet semantic segmentation network is a more common segmentation network, has a simple network structure, has quick running time and few variables, and can be applied to real-time image segmentation and mobile terminal equipment.
In practical application, firstly, the identification mapping processing can be carried out according to a three-channel color image sequence to form a training set corresponding to a training label; an asymmetric network structure is adopted, so that the decoding end can conveniently perform up-sampling fine adjustment on the encoding end; replacing a linear rectification function (relu) layer in the network with a parameter correction linear unit (PReLUs), and adding additional parameters of a characteristic diagram; replacing the convolution layer in the bottleneck structure (bottleeck) with a cavity convolution and connecting in series to increase the receptive field; a spatial random discard (Dropout) process is used to prevent overfitting.
In some possible embodiments, other semantic segmentation networks may be used, which is not limited by the present invention.
S230: and determining a motion mask area of the current frame image according to the target semantic image.
In the embodiment of the invention, the motion mask area can be determined through a semantic segmentation and compensation thread. The motion mask region may include a region of a potentially moving object, which may include a person, an animal, etc.
In one possible embodiment, as shown in fig. 3, after determining the motion mask region of the current frame image according to the target semantic image, the method may further include:
s310: acquiring a first motion mask area of a first reference frame image and a second motion mask area of a second reference frame image; the first reference frame image is a frame image before the current frame image, and the second reference frame image is a frame image before the first reference frame image.
In the embodiment of the present invention, the first motion mask region of the first reference frame image and the second motion mask region of the second reference frame image may be determined by performing semantic segmentation through a preset deep learning network.
S320: and judging whether the current frame image has missing detection according to the first motion mask area and the motion mask area of the current frame image.
In the embodiment of the invention, whether the current frame image is subjected to omission or not can be determined according to the intersection ratio by calculating the intersection ratio of the motion mask of the current frame image and the motion mask of the first reference frame image.
S330: and if the current frame image has omission, determining a first target motion mask area according to the first motion mask area and the second motion mask area.
In the embodiment of the present invention, if the current frame image has missing detection, the first motion mask region of the first reference frame image and the second motion mask region of the second reference frame image may be projected pixel by pixel into the current frame to obtain the first target motion mask region.
In particular, assume that
Figure BDA0002295843050000076
Is a motion mask for the first reference frame image,
Figure BDA0002295843050000075
motion mask for the second reference frame image, let us assume
Figure BDA0002295843050000077
For the motion mask of the current frame image, the intersection ratio of the motion mask of the current frame image and the motion mask of the first reference frame image is counted as DiouThen D isiouCan be defined as:
Figure BDA0002295843050000071
suppose thatiouIs DiouIf D is calculatediouValue less thaniouAnd indicating that the semantic meaning of the current frame image has omission. At this time, the
Figure BDA0002295843050000073
And
Figure BDA0002295843050000072
the motion mask is projected into the current frame pixel by pixel, and a new semantic detection result is calculated and recorded as SiouThen S isiouCan be defined as:
Figure BDA0002295843050000074
the intersection is taken because the subsequent motion consistency detection system can put the feature points meeting the condition back into the static point set, so that the expansion does not influence the precision of the system.
S340: and replacing the motion mask area of the current frame image with the first target motion mask area.
In the embodiment of the invention, after the motion mask area of the current frame image is determined, whether the current frame image is missed to be segmented or not can be detected through a semantic segmentation and compensation thread, and when the current frame image is judged to be missed to be detected, the semantic segmentation results of the first two frames of the current frame image are fused for compensation.
In another possible embodiment, as shown in fig. 3, the method may further include:
s350: and if the current frame image has no missing detection, determining a second target motion mask area according to the first motion mask area and the motion mask area of the current frame image.
In the embodiment of the present invention, if the current frame image does not have missing detection, the first motion mask region of the first reference frame image may be projected into the current frame pixel by pixel, and combined with the motion mask region of the current frame image to obtain the second target motion mask region.
S360: and replacing the motion mask area of the current frame image with the second target motion mask area.
In the embodiment of the invention, when the omission is judged not to occur, the semantic segmentation result of the previous frame image of the current frame image can be used for compensation, so that the possibility of omission is further reduced.
S240: and acquiring the depth information of the current frame image.
In a possible embodiment, after obtaining the depth information of the current frame image, the method may further include:
and repairing the depth information by using a preset morphological method.
In the embodiment of the invention, the depth information of the current frame image can be restored by adopting methods such as expansion operation, interpolation and the like through a tracking estimation thread.
S250: and performing motion consistency detection based on the target semantic image and the depth information, and determining a static characteristic point set of the current frame image.
In the embodiment of the invention, the motion consistency detection can be carried out through the tracking estimation thread, the dynamic characteristic points of the moving object in the current frame image are removed, and the static characteristic point set of the current frame image is obtained.
In one possible embodiment, as shown in fig. 4, the performing motion consistency detection based on the target semantic image and the depth information, and determining the set of static feature points of the current frame image may include:
s410: and determining the background area of the current frame image according to the target semantic image.
In the embodiment of the present invention, the background area may include a static object area, and the static object may include a road, a tree, a building, and the like.
S420: and determining first position information according to the characteristic points of the background area of the current frame image.
In this embodiment of the present invention, the determining the first pose information according to the feature point of the background region of the current frame image may include: acquiring feature points of a background area of a first reference frame image, and matching the feature points of the background area of the first reference frame image with the feature points of the background area of the current frame image to obtain matching point pairs; wherein, the first reference frame image is a previous frame image of the current frame image; screening the matching point pairs through a distance constraint and random sample consensus (RANSAC) algorithm to remove mismatching matching point pairs; and calculating the first position information by using the screened matching point pairs through a normalization eight-point method.
Specifically, assume that the set of feature points of the background region of the first reference frame image is Bi-1The feature point set of the background region of the current frame image is BiSet B toi-1And set BiThe feature points in the description are matched to obtain a matching point pair set, the distance between descriptors is calculated, and the RANSAC algorithm is adopted to match the matching pointsAnd carrying out secondary screening on the matching point pairs in the matching point pair set to obtain a stable point set corresponding relation. The calculation method of the first posture information may include:
feature point set B from background regioniAnd Bi-1Determining a matching point pair set, selecting 8 matching point pairs from the matching point pair set, and estimating a basic matrix F by using a normalization eight-point methodi
Calculating the distances d from the rest of the matched point pairs in the matched point pair set to the corresponding epipolar lines thereofnIf d isn<d, the point is an inner point, otherwise, the point is an outer point, and the number of the inner points which meet the condition is recorded as miWherein d is a preset distance threshold;
iterate S times, or obtain the number m of inner pointsiThe total proportion of the set is greater than or equal to a preset proportion (for example, 95%), and the iteration is stopped. Selection of miMaximum basis matrix FiAs the first pose.
S430: determining types of feature points of the motion mask region based on the first pose information and the depth information, the types including dynamic feature points and static feature points.
In one possible embodiment, as shown in fig. 5, the determining the type of the feature point of the motion mask region based on the first pose information and the depth information may include:
s431: and calculating the motion score of the characteristic point of the motion mask region according to the first position and posture information and the depth information.
S432: and when the motion score is smaller than a preset threshold value, judging that the characteristic point is a static characteristic point.
S433: and when the motion score is greater than or equal to a preset threshold value, judging the characteristic point as a dynamic characteristic point.
In another possible embodiment, the calculating the motion score of the feature point of the motion mask region according to the first pose information and the depth information may include:
acquiring feature points of a motion mask area of a first reference frame image, and matching the feature points of the motion mask area of the first reference frame image with the feature points of the motion mask area of the current frame image to obtain matching point pairs; wherein, the first reference frame image is a previous frame image of the current frame image;
screening the matching point pairs, and removing the matching point pairs which are mismatched;
and calculating the distance between the screened matching point pairs according to the first position information and the depth information, and taking the distance as the motion score of the characteristic points of the motion mask area.
In the embodiment of the invention, the motion score of the feature point of the motion mask region can be calculated by using a constraint mode comprising epipolar constraint, depth constraint and the like. In some possible embodiments, other constraint manners may be adopted, and the present invention is not limited to this.
Specifically, after the matching point pair is obtained, the matching point pair may be screened by using a distance constraint and a RANSAC algorithm, so as to obtain a screened matching point pair.
Let p be1And p2A set of matching pairs of points, p, being characteristic points of a motion mask1Characteristic points, p, of motion mask regions of said first reference frame image2For the feature points of the motion mask region of the current frame image, d1And d2Is p1And p2Corresponding depth values on the depth map. Deriving p from multi-view geometry1And p2The epipolar constraint should be satisfied, whose expression is as follows:
Figure BDA0002295843050000101
and F is a basic matrix obtained by calculating the characteristic points of the background area. Since the actual value of the object is not 0 due to the motion of the object, the error distance is obtained. Meanwhile, the current depth value can be obtained by carrying out perspective transformation and pose transformation on the first reference frame image feature point and carrying out reprojection, and the difference of the depth values is taken as a distance error, wherein the specific formula is as follows:
Figure BDA0002295843050000102
wherein D represents the distance and is a corresponding threshold, and if the calculated D value is greater than or equal to the threshold, the characteristic point p is judged2Is a dynamic feature point, if the D value is less than the D value, the feature point p is judged2Are static feature points.
S440: and eliminating dynamic characteristic points in the characteristic points of the motion mask area, and reserving static characteristic points.
In this embodiment of the present invention, the feature points of the motion mask region may be counted as a set, and when the feature points are determined to be dynamic feature points, the feature points are deleted from the set, and when the feature points are determined to be static feature points, the feature points are retained.
S450: and generating a static characteristic point set according to the static characteristic points and the characteristic points of the background area.
In the embodiment of the invention, the feature points of all the motion mask regions can be traversed, and the static feature point set is generated according to the reserved static feature points and the feature points of the background region.
S260: and determining the current state pose information according to the static feature point set.
In one possible embodiment, the determining current state pose information from the set of static feature points may include:
when a new key frame is generated, establishing data association among the feature points in the static feature point set, the key frame and the map point;
determining second attitude information according to the feature points in the static feature point set;
and performing pose optimization according to the second pose information and the data association to determine the pose information of the current state.
In the embodiment of the invention, the data association among the feature points, the key frames and the map points in the static feature point set can be established through the tracking estimation thread; and performing pose optimization through a local map thread to determine the pose information of the current state. Specifically, a basis matrix may be calculated by using a normalized eight-point method according to a matching point pair composed of the feature point in the static feature point set and the feature point in the first reference frame image matched with the feature point, so as to obtain the second pose information. And performing pose optimization by using the second pose information as an initial value and utilizing a local bundling technology to determine the pose information of the current state.
In one possible embodiment, the method may further include:
generating static object point cloud data based on the target semantic image, the current state pose information and the static feature point set;
and performing dense reconstruction of the static object point cloud map according to the static object point cloud data.
In the embodiment of the invention, dense reconstruction of the static object point cloud map can be carried out through the dense point cloud map building process.
In a possible embodiment, local optimization and key frame screening can be performed on all key frames through a local map thread, including corresponding addition and deletion of key frames and map points, so that estimation of the system can be more accurate and stable, and the semantic segmentation and compensation thread and the local map thread can perform information interaction to perform optimization of the rear-end pose together. The optimized key frame can be globally adjusted and loop detected through a loop detection thread. Pose estimation and dynamic area judgment can be carried out through a dense point cloud mapping process, a static background part is generated into local point cloud, and integral splicing and updating are completed; the tracking estimation thread, the semantic segmentation and compensation thread and the dense point cloud mapping thread can carry out information interaction, the pixels of the detected part of the moving object are removed according to the semantic segmentation result, the accurate pose obtained by the pose optimization is combined, and dense reconstruction of the point cloud map of the static object is realized through an ICP method.
In a specific embodiment, the practical effects of the embodiment of the invention are verified through the internationally open source data set. The data set is an RGBD indoor data set open at the Technical University of Munich (TUM), 8 high-speed cameras (100hz) are adopted for shooting, and a real motion track of the data set is provided through a motion capture system. The data set comprises the original image and the corresponding depth map, and provides an alignment script between the original image and the depth map and an evaluation script used for evaluating the pose estimation accuracy of the SLAM system. Compared with the ORB-SLAM2 system and the DS-SLAM system, the invention has the comparison indexes of Absolute Track Error (ATE) and Relative Position Error (RPE). The test results are shown in fig. 6A to 6C, wherein the left graph is the comparison result of the actual trajectory, including the true trajectory, the trajectory and the error line segment therebetween, and the right graph is the curve of the PRE variation with time, describing the relative stability of the system. The result shows that the method of the embodiment of the invention is superior to an ORB-SLAM2 system and a DS-SLAM system in effect, and the system can reach 22FPS due to the adoption of the ENet network, thereby completely meeting the requirement of real-time property.
In summary, the visual positioning method in a dynamic scene of the present invention has the following beneficial effects:
according to the visual positioning method under the dynamic scene, the image is subjected to semantic segmentation, the motion consistency is detected according to the semantic segmentation result and the depth information, the dynamic feature points of a motion mask area are removed, the static feature points in the motion mask are added into a static feature point set, the pose optimization is realized by utilizing the static feature point set, and a dense static object point cloud map is reconstructed. The accuracy and robustness of pose estimation in a dynamic environment can be improved, and the accuracy of the static object point cloud map is improved, so that the accuracy of environment perception is improved.
In addition, the visual positioning method under the dynamic scene is based on the assumption of motion continuity, the semantic segmentation results of the adjacent frame images are fused, the condition of missing segmentation is compensated, and the accuracy of the method can be further improved.
Referring to fig. 7 in the specification, the present embodiment provides a visual positioning apparatus 700 in a dynamic scene, where the apparatus 700 may include:
a first obtaining module 710, configured to obtain a current frame image and extract feature points of the current frame image;
a semantic segmentation module 720, configured to input the current frame image into a preset deep learning network for semantic segmentation to obtain a target semantic image;
a determining module 730, configured to determine a motion mask region of the current frame image according to the target semantic image;
a second obtaining module 740, configured to obtain depth information of the current frame image;
a detection module 750, configured to perform motion consistency detection based on the target semantic image and the depth information, and determine a static feature point set of the current frame image;
and a positioning module 760 for determining the current state pose information according to the static feature point set.
It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above.
The foregoing description has disclosed fully preferred embodiments of the present invention. It should be noted that those skilled in the art can make modifications to the embodiments of the present invention without departing from the scope of the appended claims. Accordingly, the scope of the appended claims is not to be limited to the specific embodiments described above.

Claims (10)

1. A visual positioning method in a dynamic scene is characterized by comprising the following steps:
acquiring a current frame image, and extracting feature points of the current frame image;
inputting the current frame image into a preset deep learning network for semantic segmentation to obtain a target semantic image;
determining a motion mask area of the current frame image according to the target semantic image;
acquiring depth information of the current frame image;
performing motion consistency detection based on the target semantic image and the depth information, and determining a static feature point set of the current frame image;
and determining the current state pose information according to the static feature point set.
2. The method of claim 1, wherein the performing motion consistency detection based on the target semantic image and the depth information, and wherein determining the set of static feature points of the current frame image comprises:
determining a background area of the current frame image according to the target semantic image;
determining first attitude information according to the characteristic points of the background area of the current frame image;
determining types of feature points of the motion mask region based on the first pose information and the depth information, the types including dynamic feature points and static feature points;
removing dynamic characteristic points in the characteristic points of the motion mask area and reserving static characteristic points;
and generating a static characteristic point set according to the static characteristic points and the characteristic points of the background area.
3. The method of claim 2, wherein the determining the type of feature points for the motion mask region based on the first pose information and the depth information comprises:
calculating a motion score of a feature point of the motion mask region according to the first pose information and the depth information;
when the motion score is smaller than a preset threshold value, judging that the feature point is a static feature point;
and when the motion score is greater than or equal to a preset threshold value, judging the characteristic point as a dynamic characteristic point.
4. The method of claim 3, wherein the calculating motion scores for feature points of the motion mask region from the first pose information and the depth information comprises:
acquiring feature points of a motion mask area of a first reference frame image, and matching the feature points of the motion mask area of the first reference frame image with the feature points of the motion mask area of the current frame image to obtain matching point pairs; wherein, the first reference frame image is a previous frame image of the current frame image;
screening the matching point pairs, and removing the matching point pairs which are mismatched;
and calculating the distance between the screened matching point pairs according to the first position information and the depth information, and taking the distance as the motion score of the characteristic points of the motion mask area.
5. The method according to claim 1 or 2, wherein after determining the motion mask region of the current frame image according to the target semantic image, the method further comprises:
acquiring a first motion mask area of a first reference frame image and a second motion mask area of a second reference frame image; the first reference frame image is a frame image before the current frame image, and the second reference frame image is a frame image before the first reference frame image;
judging whether the current frame image has missing detection according to the first motion mask area and the motion mask area of the current frame image;
if the current frame image has missing detection, determining a first target motion mask area according to the first motion mask area and the second motion mask area;
and replacing the motion mask area of the current frame image with the first target motion mask area.
6. The method of claim 5, further comprising:
if the current frame image has no missing detection, determining a second target motion mask area according to the first motion mask area and the motion mask area of the current frame image;
and replacing the motion mask area of the current frame image with the second target motion mask area.
7. The method according to claim 1 or 2, wherein after obtaining the depth information of the current frame image, the method further comprises:
and repairing the depth information by using a preset morphological method.
8. The method according to claim 1 or 2, wherein the determining current state pose information from the set of static feature points comprises:
when a new key frame is generated, establishing data association among the feature points in the static feature point set, the key frame and the map point;
determining second attitude information according to the feature points in the static feature point set;
and performing pose optimization according to the second pose information and the data association to determine the pose information of the current state.
9. The method of claim 8, further comprising:
generating static object point cloud data based on the target semantic image, the current state pose information and the static feature point set;
and performing dense reconstruction of the static object point cloud map according to the static object point cloud data.
10. A visual positioning apparatus for dynamic scenes, comprising:
the first acquisition module is used for acquiring a current frame image and extracting the characteristic points of the current frame image;
the semantic segmentation module is used for inputting the current frame image into a preset deep learning network for semantic segmentation to obtain a target semantic image;
the determining module is used for determining a motion mask area of the current frame image according to the target semantic image;
the second acquisition module is used for acquiring the depth information of the current frame image;
the detection module is used for carrying out motion consistency detection based on the target semantic image and the depth information and determining a static characteristic point set of the current frame image;
and the positioning module is used for determining the current state pose information according to the static characteristic point set.
CN201911200881.0A 2019-11-29 2019-11-29 Visual positioning method and device under dynamic scene Active CN111724439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911200881.0A CN111724439B (en) 2019-11-29 2019-11-29 Visual positioning method and device under dynamic scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911200881.0A CN111724439B (en) 2019-11-29 2019-11-29 Visual positioning method and device under dynamic scene

Publications (2)

Publication Number Publication Date
CN111724439A true CN111724439A (en) 2020-09-29
CN111724439B CN111724439B (en) 2024-05-17

Family

ID=72563948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911200881.0A Active CN111724439B (en) 2019-11-29 2019-11-29 Visual positioning method and device under dynamic scene

Country Status (1)

Country Link
CN (1) CN111724439B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381841A (en) * 2020-11-27 2021-02-19 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on GMS feature matching in dynamic scene
CN112381828A (en) * 2020-11-09 2021-02-19 Oppo广东移动通信有限公司 Positioning method, device, medium and equipment based on semantic and depth information
CN112561978A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Training method of depth estimation network, depth estimation method of image and equipment
CN112581610A (en) * 2020-10-16 2021-03-30 武汉理工大学 Robust optimization method and system for establishing map from multi-beam sonar data
CN112884831A (en) * 2021-02-02 2021-06-01 清华大学 Method for extracting long-term static characteristics of indoor parking lot based on probability mask
CN113140007A (en) * 2021-05-17 2021-07-20 上海驭矩信息科技有限公司 Dense point cloud based collection card positioning method and device
CN113240723A (en) * 2021-05-18 2021-08-10 中德(珠海)人工智能研究院有限公司 Monocular depth estimation method and device and depth evaluation equipment
CN113378746A (en) * 2021-06-22 2021-09-10 中国科学技术大学 Positioning method and device
CN113673524A (en) * 2021-07-05 2021-11-19 北京物资学院 Method and device for removing dynamic characteristic points of warehouse semi-structured environment
CN113705329A (en) * 2021-07-07 2021-11-26 浙江大华技术股份有限公司 Re-recognition method, training method of target re-recognition network and related equipment
CN113920194A (en) * 2021-10-08 2022-01-11 电子科技大学 Four-rotor aircraft positioning method based on visual inertia fusion
CN114820639A (en) * 2021-01-19 2022-07-29 北京四维图新科技股份有限公司 Image processing method, device and equipment based on dynamic scene and storage medium
CN114926536A (en) * 2022-07-19 2022-08-19 合肥工业大学 Semantic-based positioning and mapping method and system and intelligent robot
WO2022188154A1 (en) * 2021-03-12 2022-09-15 深圳市大疆创新科技有限公司 Front view to top view semantic segmentation projection calibration parameter determination method and adaptive conversion method, image processing device, mobile platform, and storage medium
WO2022217794A1 (en) * 2021-04-12 2022-10-20 深圳大学 Positioning method of mobile robot in dynamic environment

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120194644A1 (en) * 2011-01-31 2012-08-02 Microsoft Corporation Mobile Camera Localization Using Depth Maps
WO2014150739A1 (en) * 2013-03-15 2014-09-25 Honeywell International Inc. Virtual mask alignment for fit analysis
US20150178939A1 (en) * 2013-11-27 2015-06-25 Magic Leap, Inc. Virtual and augmented reality systems and methods
CN104851094A (en) * 2015-05-14 2015-08-19 西安电子科技大学 Improved method of RGB-D-based SLAM algorithm
US9881207B1 (en) * 2016-10-25 2018-01-30 Personify, Inc. Methods and systems for real-time user extraction using deep learning networks
CN107924579A (en) * 2015-08-14 2018-04-17 麦特尔有限公司 The method for generating personalization 3D head models or 3D body models
US9965865B1 (en) * 2017-03-29 2018-05-08 Amazon Technologies, Inc. Image data segmentation using depth data
CN108398139A (en) * 2018-03-01 2018-08-14 北京航空航天大学 A kind of dynamic environment visual odometry method of fusion fish eye images and depth image
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
WO2019018315A1 (en) * 2017-07-17 2019-01-24 Kaarta, Inc. Aligning measured signal data with slam localization data and uses thereof
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber
US20190079535A1 (en) * 2017-09-13 2019-03-14 TuSimple Training and testing of a neural network method for deep odometry assisted by static scene optical flow
WO2019062291A1 (en) * 2017-09-29 2019-04-04 歌尔股份有限公司 Binocular vision positioning method, device, and system
CN109920055A (en) * 2019-03-08 2019-06-21 视辰信息科技(上海)有限公司 Construction method, device and the electronic equipment of 3D vision map
CN109974743A (en) * 2019-03-14 2019-07-05 中山大学 A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure
CN110058602A (en) * 2019-03-27 2019-07-26 天津大学 Multi-rotor unmanned aerial vehicle autonomic positioning method based on deep vision
CN110223348A (en) * 2019-02-25 2019-09-10 湖南大学 Robot scene adaptive bit orientation estimation method based on RGB-D camera
CN110298884A (en) * 2019-05-27 2019-10-01 重庆高开清芯科技产业发展有限公司 A kind of position and orientation estimation method suitable for monocular vision camera in dynamic environment
CN110322500A (en) * 2019-06-28 2019-10-11 Oppo广东移动通信有限公司 Immediately optimization method and device, medium and the electronic equipment of positioning and map structuring
CN110335316A (en) * 2019-06-28 2019-10-15 Oppo广东移动通信有限公司 Method, apparatus, medium and electronic equipment are determined based on the pose of depth information
CN110349250A (en) * 2019-06-28 2019-10-18 浙江大学 A kind of three-dimensional rebuilding method of the indoor dynamic scene based on RGBD camera
CN110378345A (en) * 2019-06-04 2019-10-25 广东工业大学 Dynamic scene SLAM method based on YOLACT example parted pattern
WO2019205852A1 (en) * 2018-04-27 2019-10-31 腾讯科技(深圳)有限公司 Method and apparatus for determining pose of image capture device, and storage medium therefor
CN110458863A (en) * 2019-06-25 2019-11-15 广东工业大学 A kind of dynamic SLAM system merged based on RGBD with encoder
WO2019223463A1 (en) * 2018-05-22 2019-11-28 腾讯科技(深圳)有限公司 Image processing method and apparatus, storage medium, and computer device
US20190362157A1 (en) * 2018-05-25 2019-11-28 Vangogh Imaging, Inc. Keyframe-based object scanning and tracking

Patent Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120194644A1 (en) * 2011-01-31 2012-08-02 Microsoft Corporation Mobile Camera Localization Using Depth Maps
WO2014150739A1 (en) * 2013-03-15 2014-09-25 Honeywell International Inc. Virtual mask alignment for fit analysis
US20150178939A1 (en) * 2013-11-27 2015-06-25 Magic Leap, Inc. Virtual and augmented reality systems and methods
CN104851094A (en) * 2015-05-14 2015-08-19 西安电子科技大学 Improved method of RGB-D-based SLAM algorithm
CN107924579A (en) * 2015-08-14 2018-04-17 麦特尔有限公司 The method for generating personalization 3D head models or 3D body models
US9881207B1 (en) * 2016-10-25 2018-01-30 Personify, Inc. Methods and systems for real-time user extraction using deep learning networks
US9965865B1 (en) * 2017-03-29 2018-05-08 Amazon Technologies, Inc. Image data segmentation using depth data
WO2019018315A1 (en) * 2017-07-17 2019-01-24 Kaarta, Inc. Aligning measured signal data with slam localization data and uses thereof
US20190079535A1 (en) * 2017-09-13 2019-03-14 TuSimple Training and testing of a neural network method for deep odometry assisted by static scene optical flow
WO2019062291A1 (en) * 2017-09-29 2019-04-04 歌尔股份有限公司 Binocular vision positioning method, device, and system
CN108398139A (en) * 2018-03-01 2018-08-14 北京航空航天大学 A kind of dynamic environment visual odometry method of fusion fish eye images and depth image
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
WO2019205852A1 (en) * 2018-04-27 2019-10-31 腾讯科技(深圳)有限公司 Method and apparatus for determining pose of image capture device, and storage medium therefor
WO2019223463A1 (en) * 2018-05-22 2019-11-28 腾讯科技(深圳)有限公司 Image processing method and apparatus, storage medium, and computer device
US20190362157A1 (en) * 2018-05-25 2019-11-28 Vangogh Imaging, Inc. Keyframe-based object scanning and tracking
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber
CN110223348A (en) * 2019-02-25 2019-09-10 湖南大学 Robot scene adaptive bit orientation estimation method based on RGB-D camera
CN109920055A (en) * 2019-03-08 2019-06-21 视辰信息科技(上海)有限公司 Construction method, device and the electronic equipment of 3D vision map
CN109974743A (en) * 2019-03-14 2019-07-05 中山大学 A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure
CN110058602A (en) * 2019-03-27 2019-07-26 天津大学 Multi-rotor unmanned aerial vehicle autonomic positioning method based on deep vision
CN110298884A (en) * 2019-05-27 2019-10-01 重庆高开清芯科技产业发展有限公司 A kind of position and orientation estimation method suitable for monocular vision camera in dynamic environment
CN110378345A (en) * 2019-06-04 2019-10-25 广东工业大学 Dynamic scene SLAM method based on YOLACT example parted pattern
CN110458863A (en) * 2019-06-25 2019-11-15 广东工业大学 A kind of dynamic SLAM system merged based on RGBD with encoder
CN110335316A (en) * 2019-06-28 2019-10-15 Oppo广东移动通信有限公司 Method, apparatus, medium and electronic equipment are determined based on the pose of depth information
CN110349250A (en) * 2019-06-28 2019-10-18 浙江大学 A kind of three-dimensional rebuilding method of the indoor dynamic scene based on RGBD camera
CN110322500A (en) * 2019-06-28 2019-10-11 Oppo广东移动通信有限公司 Immediately optimization method and device, medium and the electronic equipment of positioning and map structuring

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
M. LI ET AL.: "A Real-time Indoor Visual Localization and Navigation Method Based on Tango Smartphone", 《2018 UBIQUITOUS POSITIONING, INDOOR NAVIGATION AND LOCATION-BASED SERVICES (UPINLBS)》, 6 December 2018 (2018-12-06) *
代具亭: "基于RGB-D视频序列的大尺度场景三维语义表面重建技术研究", 《中国博士学位论文全文数据库》 *
姜昊辰等: "基于语义先验和深度约束的室内动态场景RGB-D SLAM算法", 《信息与控制》, 25 December 2020 (2020-12-25) *
王泽宇: "面向场景解析的深度学习网络研究", 《中国博士学位论文全文数据库》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581610A (en) * 2020-10-16 2021-03-30 武汉理工大学 Robust optimization method and system for establishing map from multi-beam sonar data
CN112381828A (en) * 2020-11-09 2021-02-19 Oppo广东移动通信有限公司 Positioning method, device, medium and equipment based on semantic and depth information
CN112381828B (en) * 2020-11-09 2024-06-07 Oppo广东移动通信有限公司 Positioning method, device, medium and equipment based on semantic and depth information
CN112381841A (en) * 2020-11-27 2021-02-19 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on GMS feature matching in dynamic scene
CN112561978A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Training method of depth estimation network, depth estimation method of image and equipment
CN112561978B (en) * 2020-12-18 2023-11-17 北京百度网讯科技有限公司 Training method of depth estimation network, depth estimation method of image and equipment
CN114820639A (en) * 2021-01-19 2022-07-29 北京四维图新科技股份有限公司 Image processing method, device and equipment based on dynamic scene and storage medium
CN112884831A (en) * 2021-02-02 2021-06-01 清华大学 Method for extracting long-term static characteristics of indoor parking lot based on probability mask
CN112884831B (en) * 2021-02-02 2022-10-04 清华大学 Method for extracting long-term static characteristics of indoor parking lot based on probability mask
WO2022188154A1 (en) * 2021-03-12 2022-09-15 深圳市大疆创新科技有限公司 Front view to top view semantic segmentation projection calibration parameter determination method and adaptive conversion method, image processing device, mobile platform, and storage medium
WO2022217794A1 (en) * 2021-04-12 2022-10-20 深圳大学 Positioning method of mobile robot in dynamic environment
CN113140007A (en) * 2021-05-17 2021-07-20 上海驭矩信息科技有限公司 Dense point cloud based collection card positioning method and device
CN113140007B (en) * 2021-05-17 2023-12-19 上海驭矩信息科技有限公司 Concentrated point cloud-based set card positioning method and device
CN113240723A (en) * 2021-05-18 2021-08-10 中德(珠海)人工智能研究院有限公司 Monocular depth estimation method and device and depth evaluation equipment
CN113378746B (en) * 2021-06-22 2022-09-02 中国科学技术大学 Positioning method and device
CN113378746A (en) * 2021-06-22 2021-09-10 中国科学技术大学 Positioning method and device
CN113673524A (en) * 2021-07-05 2021-11-19 北京物资学院 Method and device for removing dynamic characteristic points of warehouse semi-structured environment
CN113705329A (en) * 2021-07-07 2021-11-26 浙江大华技术股份有限公司 Re-recognition method, training method of target re-recognition network and related equipment
CN113920194A (en) * 2021-10-08 2022-01-11 电子科技大学 Four-rotor aircraft positioning method based on visual inertia fusion
CN113920194B (en) * 2021-10-08 2023-04-21 电子科技大学 Positioning method of four-rotor aircraft based on visual inertia fusion
CN114926536A (en) * 2022-07-19 2022-08-19 合肥工业大学 Semantic-based positioning and mapping method and system and intelligent robot

Also Published As

Publication number Publication date
CN111724439B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
CN111724439A (en) Visual positioning method and device in dynamic scene
US11763485B1 (en) Deep learning based robot target recognition and motion detection method, storage medium and apparatus
Cheng et al. Noise-aware unsupervised deep lidar-stereo fusion
CN110569704B (en) Multi-strategy self-adaptive lane line detection method based on stereoscopic vision
CN110349250B (en) RGBD camera-based three-dimensional reconstruction method for indoor dynamic scene
Von Stumberg et al. Gn-net: The gauss-newton loss for multi-weather relocalization
US11443454B2 (en) Method for estimating the pose of a camera in the frame of reference of a three-dimensional scene, device, augmented reality system and computer program therefor
Kang et al. Detection and tracking of moving objects from a moving platform in presence of strong parallax
CN109410316B (en) Method for three-dimensional reconstruction of object, tracking method, related device and storage medium
Tang et al. ESTHER: Joint camera self-calibration and automatic radial distortion correction from tracking of walking humans
Correal et al. Automatic expert system for 3D terrain reconstruction based on stereo vision and histogram matching
Jiang et al. Static-map and dynamic object reconstruction in outdoor scenes using 3-d motion segmentation
CN110706269B (en) Binocular vision SLAM-based dynamic scene dense modeling method
CN110188835A (en) Data based on production confrontation network model enhance pedestrian&#39;s recognition methods again
CN112446882A (en) Robust visual SLAM method based on deep learning in dynamic scene
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN111797688A (en) Visual SLAM method based on optical flow and semantic segmentation
CN114677323A (en) Semantic vision SLAM positioning method based on target detection in indoor dynamic scene
CN106530407A (en) Three-dimensional panoramic splicing method, device and system for virtual reality
CN114170304A (en) Camera positioning method based on multi-head self-attention and replacement attention
US20230281862A1 (en) Sampling based self-supervised depth and pose estimation
EP4174770B1 (en) Monocular-vision-based detection of moving objects
CN113592947B (en) Method for realizing visual odometer by semi-direct method
Gonzalez-Huitron et al. Jaccard distance as similarity measure for disparity map estimation
CN110930519B (en) Semantic ORB-SLAM sensing method and device based on environment understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant