skip to main content
research-article

Depth-map completion for large indoor scene reconstruction

Published: 01 March 2020 Publication History

Highlights

Propose a new depth completion algorithm for MVS depth-maps.
Use occlusion boundary to solve depth discontinuity problem.
Propose an iterative filtering and completion method for large indoor scene reconstruction.

Abstract

Traditional Multi View Stereo (MVS) algorithms are often difficult to deal with large-scale indoor scene reconstruction, due to the photo-consistency measurement errors in weak textured regions, which are commonly exist in indoor scenes. To solve this limitation, in this paper we proposed a point cloud completion strategy that combines learning-based depth-map completion and geometry-based consistency filtering to fill large-area missing in depth-maps. The proposed method takes nonuniform and noisy MVS depth-map as input, and completes each depth-map individually. In the completion process, we first complete depth-maps using learning based method, and then filter each depth-map using depth consistency validation with its neighboring depth-maps. This depth-map completion and geometric filtering steps are performed iteratively until the number of depth points is converged. Experiments on large-scale indoor scenes and benchmark MVS datasets demonstrate the effectiveness of the proposed methods.

References

[1]
S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski, A comparison and evaluation of multi-view stereo reconstruction algorithms, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, IEEE, 2006, pp. 519–528.
[2]
H. Cui, X. Gao, S. Shen, Z. Hu, HSfM: hybrid structure-from-motion, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1212–1221.
[3]
J.L. Schönberger, J.-M. Frahm, Structure-from-motion revisited, Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[4]
K.N. Kutulakos, S.M. Seitz, A theory of shape by space carving, Int. J. Comput. Vis. 38 (3) (2000) 199–218.
[5]
S. Galliani, K. Lasinger, K. Schindler, Massively parallel multiview stereopsis by surface normal diffusion, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 873–881.
[6]
D. Zhang, J. Han, C. Li, J. Wang, X. Li, Detection of co-salient objects by looking deep and wide, Int. J. Comput. Vis. 120 (2) (2016) 215–232.
[7]
J. Han, D. Zhang, X. Hu, L. Guo, J. Ren, F. Wu, Background prior-based salient object detection via deep reconstruction residual, IEEE Trans. Circuits Syst. Video Technol. 25 (8) (2014) 1309–1321.
[8]
G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Trans. Geosci. Remote Sens. 54 (12) (2016) 7405–7415.
[9]
C. Ma, J.-B. Huang, X. Yang, M.-H. Yang, Hierarchical convolutional features for visual tracking, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3074–3082.
[10]
L. Wang, W. Ouyang, X. Wang, H. Lu, STCT: sequentially training convolutional networks for visual tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1373–1381.
[11]
Y. Zhang, T. Funkhouser, Deep depth completion of a single RGB-D image, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 175–185.
[12]
M. Beljan, J. Ackermann, M. Goesele, Consensus multi-view photometric stereo, Pattern Recognit. 30 (3) (2012) 548–554.
[13]
M. Goesele, N. Snavely, B. Curless, H. Hoppe, S.M. Seitz, Multi-view stereo for community photo collections, 2007 IEEE 11th International Conference on Computer Vision, IEEE, 2007, pp. 1–8.
[14]
Y. Furukawa, J. Ponce, Accurate, dense, and robust multiview stereopsis, IEEE Trans. Pattern Anal. Mach. Intell. 32 (8) (2010) 1362–1376.
[15]
G. Vogiatzis, C.H. Esteban, P.H. Torr, R. Cipolla, Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency, IEEE Trans. Pattern Anal. Mach. Intell. 29 (12) (2007) 2241–2246.
[16]
I. Kostrikov, E. Horbert, B. Leibe, Probabilistic labeling cost for high-accuracy multi-view reconstruction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1534–1541.
[17]
M. Blaha, M. Rothermel, M.R. Oswald, T. Sattler, A. Richard, J.D. Wegner, M. Pollefeys, K. Schindler, Semantically informed multiview surface refinement, The IEEE International Conference on Computer Vision (ICCV), 2017.
[18]
M. Jancosek, T. Pajdla, Multi-view reconstruction preserving weakly-supported surfaces, CVPR 2011, IEEE, 2011, pp. 3121–3128.
[19]
S. Shen, Z. Hu, How to select good neighboring images in depth-map merging based 3D modeling, IEEE Trans. Image Process. 23 (1) (2013) 308–318.
[20]
J.L. Schönberger, E. Zheng, J.-M. Frahm, M. Pollefeys, Pixelwise view selection for unstructured multi-view stereo, European Conference on Computer Vision, Springer, 2016, pp. 501–518.
[21]
A. Knapitsch, J. Park, Q.-Y. Zhou, V. Koltun, Tanks and temples: benchmarking large-scale scene reconstruction, ACM Trans. Graph. (ToG) 36 (4) (2017) 78.
[22]
T. Schöps, J.L. Schönberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, A. Geiger, A multi-view stereo benchmark with high-resolution images and multi-camera videos, Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[23]
M. Babaee, D.T. Dinh, G. Rigoll, A deep convolutional neural network for video sequence background subtraction, Pattern Recognit. 76 (2018) 635–649.
[24]
L. Wu, Y. Wang, X. Li, J. Gao, What-and-where to match: deep spatially multiplicative integration networks for person re-identification, Pattern Recognit. 76 (2018) 727–738.
[25]
J. Han, D. Zhang, G. Cheng, L. Guo, J. Ren, Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning, IEEE Trans. Geosci. Remote Sens. 53 (6) (2014) 3325–3337.
[26]
G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller, Explaining nonlinear classification decisions with deep taylor decomposition, Pattern Recognit. 65 (2017) 211–222.
[27]
W. Wang, J. Shen, Deep visual attention prediction, IEEE Trans. Image Process. 27 (5) (2017) 2368–2378.
[28]
F. Yang, X. Li, H. Cheng, Y. Guo, L. Chen, J. Li, Multi-scale bidirectional FCN for object skeleton extraction, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[29]
W. Wang, J. Shen, F. Porikli, R. Yang, Semi-supervised video object segmentation with super-trajectories, IEEE Trans. Pattern Anal. Mach. Intell. 41 (4) (2018) 985–998.
[30]
W. Wang, J. Shen, L. Shao, Video salient object detection via fully convolutional networks, IEEE Trans. Image Process. 27 (1) (2017) 38–49.
[31]
Y. Yao, Z. Luo, S. Li, T. Fang, L. Quan, MVSNet: depth inference for unstructured multi-view stereo, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 767–783.
[32]
Y. Yao, Z. Luo, S. Li, T. Shen, T. Fang, L. Quan, Recurrent mvsnet for high-resolution multi-view stereo depth inference, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) 5525–5534.
[33]
W. Hartmann, S. Galliani, M. Havlena, L. Van Gool, K. Schindler, Learned multi-patch similarity, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1586–1594.
[34]
P.-H. Huang, K. Matzen, J. Kopf, N. Ahuja, J.-B. Huang, DeepMVS: learning multi-view stereopsis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2821–2830.
[35]
M. Ji, J. Gall, H. Zheng, Y. Liu, L. Fang, SurfaceNet: an end-to-end 3D neural network for multiview stereopsis, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2307–2315.
[36]
F. Mal, S. Karaman, Sparse-to-dense: depth prediction from sparse depth samples and a single image, 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018, pp. 1–8.
[37]
J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, A. Geiger, Sparsity invariant CNNs, 2017 International Conference on 3D Vision (3DV), IEEE, 2017, pp. 11–20.
[38]
J. Ku, A. Harakeh, S.L. Waslander, In defense of classical image processing: fast depth completion on the CPU, 2018 15th Conference on Computer and Robot Vision (CRV), IEEE, 2018, pp. 16–22.
[39]
N. Chodosh, C. Wang, S. Lucey, Deep convolutional compressed sensing for lidar depth completion, Asian Conference on Computer Vision, 2018, pp. 499–513.
[40]
K.R. Vijayanagar, M. Loghman, J. Kim, Real-time refinement of kinect depth maps using multi-resolution anisotropic diffusion, Mobile Netw. Appl. 19 (3) (2014) 414–425.
[41]
M. Jaritz, R. De Charette, E. Wirbel, X. Perrotton, F. Nashashibi, Sparse and dense data with CNNs: depth completion and semantic segmentation, 2018 International Conference on 3D Vision (3DV), IEEE, 2018, pp. 52–60.
[42]
A. Eldesokey, M. Felsberg, F.S. Khan, Confidence Propagation through CNNs for Guided Sparse Depth Regression, IEEE Trans. Pattern Anal. Mach. Intell. (2019) 11,.
[43]
W. Van Gansbeke, D. Neven, B. De Brabandere, L. Van Gool, Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty, 16th International Conference on Machine Vision Applications (MVA), 2019, pp. 1–6.
[44]
M. Camplani, L. Salgado, Efficient spatio-temporal hole filling strategy for kinect depth maps, Three-dimensional image processing (3DIP) and applications Ii, vol. 8290, International Society for Optics and Photonics, 2012, p. 82900E.
[45]
L.-K. Liu, S.H. Chan, T.Q. Nguyen, Depth reconstruction from sparse samples: representation, algorithm, and sampling, IEEE Trans. Image Process. 24 (6) (2015) 1983–1996.
[46]
J. Park, H. Kim, Y.-W. Tai, M.S. Brown, I.S. Kweon, High-quality depth map upsampling and completion for RGB-D cameras, IEEE Trans. Image Process. 23 (12) (2014) 5559–5572.
[47]
Y. Zhang, Y. Feng, X. Liu, D. Zhai, X. Ji, H. Wang, Q. Dai, Color-guided depth image recovery with adaptive data fidelity and transferred graph Laplacian regularization, IEEE Trans. Circuits Syst. Video Technol. (2019).
[48]
S. Shen, Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes, IEEE Trans. Image Process. 22 (5) (2013) 1901–1914.
[49]
T. Davis, Csparse, Society for Industrial and Applied Mathematics, Philadephia, PA, 6, 2006.

Cited By

View all

Index Terms

  1. Depth-map completion for large indoor scene reconstruction
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Pattern Recognition
      Pattern Recognition  Volume 99, Issue C
      Mar 2020
      162 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 March 2020

      Author Tags

      1. Depth completion
      2. MVS
      3. 3D Reconstruction
      4. Point cloud,

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 28 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media