skip to main content
research-article

Hierarchical Object Relationship Constrained Monocular Depth Estimation.

Published: 01 December 2021 Publication History

Abstract

Monocular depth estimation has been gaining growing momentum in recent years. Despite significant advances of this task, due to the inherent difficulty of reliably capturing contextual cues from RGB images, it remains challenging to accurately predict depth in scenes with complicated and cluttered spatial arrangement of objects. Instead of naively utilizing the primary features in the single RGB image, in this paper we propose a hierarchical object relationship constrained network for monocular depth estimation, which could enable accurate and smooth depth prediction from monocular RGB image. The key idea of our method is to exploit object-centric hierarchical relationship as contextual constraints to compensate for the regularity of spatial depth changing. In particular, we design a semantics-guided CNN network to encode the original image into a global context feature map and encode the objects’ relationship into a local relationship feature map simultaneously, so that we can leverage such effective and consolidated coding scheme over scenario samples to guide the depth prediction in a more accurate way. Benefiting from the local-to-global context constraints, our method can well respect the global depth changing and preserve the local depth details at the same time. In addition, our approach could make full use of the hierarchical semantic relationship across inner-object components and neighboring objects to define depth changing constraints. We conduct extensive experiments and make comprehensive evaluations on widely-used public datasets, and the experiments confirm that our method outperforms most state-of-the-art depth estimation methods in preserving the local details in depth.

References

[1]
C. Godard, O.M. Aodha, G.J. Brostow, Unsupervised monocular depth estimation with left-right consistency, CVPR, 2017, pp. 6602–6611.
[2]
Z. Zhang, C. Xu, J. Yang, T. Ying, L. Chen, Deep hierarchical guidance and regularization learning for end-to-end depth estimation, Pattern Recognition 83 (2018) 430–442.
[3]
I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth prediction with fully convolutional residual networks, 3D Vision, 2016, pp. 239–248.
[4]
Z. Wang, X. Ye, B. Sun, J. Yang, R. Xu, H. Li, Depth upsampling based on deep edge-aware learning, Pattern Recognit. 103 (2020) 107274,.
[5]
M. Boshtayeva, D. Hafner, J. Weickert, A focus fusion framework with anisotropic depth map smoothing, Pattern Recognition 48 (11) (2015) 3310–3323.
[6]
A.S. Malik, T.S. Choi, A novel algorithm for estimation of depth map using image focus for 3d shape recovery in the presence of noise, Pattern Recognition 41 (7) (2008) 2200–2225.
[7]
J. Fan, T.E. Weymouth, Depth from relative normal flows, Pattern Recognition 23 (9) (1990) 1011–1022.
[8]
C. Godard, O. Aodha, G. Brostow, Digging into self-supervised monocular depth estimation, ICCV, 2019.
[9]
R. Ji, L. Cao, Y. Wang, Joint depth and semantic inference from a single image via elastic conditional random field, Pattern Recognition 59 (2016) 268–281,.
[10]
Z. Hu, Z. Tan, Depth recovery and affine reconstruction under camera pure translation, Pattern Recognition 40 (10) (2007) 2826–2836,.
[11]
B. Li, Y. Dai, M. He, Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference, Pattern Recognition 83 (2018) 328–339.
[12]
F. Cheng, X. He, H. Zhang, Learning to refine depth for robust stereo estimation, Pattern Recognition 74 (2018) 122–133,.
[13]
M. Heo, J. Lee, K.-R. Kim, H.-U. Kim, C.-S. Kim, Monocular depth estimation using whole strip masking and reliability-based refinement, ECCV, 2018, pp. 39–55.
[14]
A. Torralba, A. Oliva, Depth estimation from image structure, IEEE Transactions on pattern analysis and machine intelligence 24 (9) (2002) 1226–1238.
[15]
L. Bai, F. Escolano, E.R. Hancock, Depth-based hypergraph complexity traces from directed line graphs, Pattern Recognition 54 (2016) 229–240,.
[16]
D. Eigen, C. Puhrsch, R. Fergus, Depth map prediction from a single image using a multi-scale deep network, NIPS, 2014, pp. 2366–2374.
[17]
D. Xu, W. Ouyang, X. Wang, N. Sebe, Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing, CVPR, 2018.
[18]
H. Liu, X. Tang, S. Shen, Depth-map completion for large indoor scene reconstruction, Pattern Recognition 99 (2020) 107112,.
[19]
L. Ma, R.C. Staunton, Integration of multiresolution image segmentation and neural networks for object depth recovery, Pattern Recognition 38 (7) (2005) 985–996,.
[20]
A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, End-to-end learning of geometry and context for deep stereo regression, ICCV, 2017, pp. 66–75.
[21]
H. Fu, M. Gong, C. Wang, K. Batmanghelich, D. Tao, Deep Ordinal Regression Network for Monocular Depth Estimation, CVPR, 2018, pp. 2002–2011.
[22]
D. Xu, E. Ricci, W. Ouyang, X. Wang, N. Sebe, Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation, CVPR, 2017, pp. 161–169.
[23]
J. Jiao, Y. Cao, Y. Song, R. Lau, Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss, ECCV, 2018, pp. 55–71.
[24]
Z. Zhang, Z. Cui, C. Xu, Z. Jie, X. Li, J. Yang, Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation, ECCV, 2018, pp. 238–255.
[25]
Z. Zhang, Z. Cui, C. Xu, Y. Yan, N. Sebe, J. Yang, Pattern-affinitive propagation across depth, surface normal and semantic segmentation, CVPR, 2019, pp. 4101–4110.
[26]
D. Xu, W. Ouyang, X. Wang, N. Sebe, PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing, CVPR, 2018, pp. 675–684.
[27]
A. Atapour-Abarghouei, T.P. Breckon, Veritatem Dies Aperit-Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach, CVPR, 2019, pp. 3368–3379.
[28]
R. Garg, V.K. BG, G. Carneiro, I. Reid, Unsupervised cnn for single view depth estimation: Geometry to the rescue, ECCV, 2016, pp. 740–756.
[29]
Y. Kuznietsov, J. Stückler, B. Leibe, Semi-supervised deep learning for monocular depth map prediction, CVPR, 2017, pp. 6647–6655.
[30]
S. Li, J. Shi, W. Song, A. Hao, H. Qin, Few-shot learning for monocular depth estimation based on local object relationship, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), 2019, pp. 1221–1228,.
[31]
F. Chollet, Xception: Deep learning with depthwise separable convolutions, CVPR, 2017, pp. 1800–1807.
[32]
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, ECCV, 2018, pp. 833–851.
[33]
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, Y. Wei, Deformable convolutional networks, ICCV, 2017, pp. 764–773.
[34]
N. Silberman, D. Hoiem, P. Kohli, R. Fergus, Indoor segmentation and support inference from rgbd images, ECCV, 2012, pp. 746–760.
[35]
A. Saxena, S.H. Chung, A.Y. Ng, Learning depth from single monocular images, Advances in Neural Information Processing Systems 18, 2006, pp. 1161–1168.
[36]
A. Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: The kitti dataset, Int. J. Rob. Res. 32 (11) (2013) 1231–1237.
[37]
M. Liu, M. Salzmann, X. He, Discrete-continuous depth estimation from a single image, CVPR, 2014, pp. 716–723.
[38]
F. Liu, C. Shen, G. Lin, I. Reid, Learning depth from single monocular images using deep convolutional neural fields, IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (10) (2016) 2024–2039.
[39]
J. Jiao, Y. Cao, Y. Song, R.W.H. Lau, Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss, ECCV, 2018, pp. 55–71.
[40]
X. Yang, Y. Gao, H. Luo, C. Liao, K. Cheng, Bayesian denet: Monocular depth prediction and frame-wise fusion with synchronized uncertainty, IEEE Transactions on Multimedia 21 (11) (2019) 2701–2713.
[41]
D. Eigen, R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, ICCV, 2015, pp. 2650–2658.
[42]
A. Roy, S. Todorovic, Monocular depth estimation using neural regression forest, CVPR, 2016, pp. 5506–5514.
[43]
W. Song, S. Li, J. Liu, A. Hao, Q. Zhao, H. Qin, Contextualized cnn for scene-aware depth estimation from single rgb image, IEEE Transactions on Multimedia 22 (5) (2020) 1220–1233.
[44]
D. Xu, W. Wang, H. Tang, H. Liu, N. Sebe, E. Ricci, Structured attention guided convolutional neural fields for monocular depth estimation, CVPR, 2018, pp. 3917–3925.
[45]
F. Tosi, F. Aleotti, M. Poggi, S. Mattoccia, Learning monocular depth estimation infusing traditional stereo knowledge, CVPR, 2019, pp. 9791–9801.
[46]
S. Song, S.P. Lichtenberg, J. Xiao, Sun rgb-d: A rgb-d scene understanding benchmark suite, CVPR, 2015, pp. 567–576.

Cited By

View all

Index Terms

  1. Hierarchical Object Relationship Constrained Monocular Depth Estimation.
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Pattern Recognition
          Pattern Recognition  Volume 120, Issue C
          Dec 2021
          799 pages

          Publisher

          Elsevier Science Inc.

          United States

          Publication History

          Published: 01 December 2021

          Author Tags

          1. Monocular Depth Estimation
          2. Semantic Constraints
          3. Hierarchical Object Relationship
          4. Global and Local Context

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 16 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media