research-article

Hierarchical Object Relationship Constrained Monocular Depth Estimation.

Authors:

Hong QinAuthors Info & Claims

Volume 120, Issue C

https://doi.org/10.1016/j.patcog.2021.108116

Published: 01 December 2021 Publication History

Abstract

Monocular depth estimation has been gaining growing momentum in recent years. Despite significant advances of this task, due to the inherent difficulty of reliably capturing contextual cues from RGB images, it remains challenging to accurately predict depth in scenes with complicated and cluttered spatial arrangement of objects. Instead of naively utilizing the primary features in the single RGB image, in this paper we propose a hierarchical object relationship constrained network for monocular depth estimation, which could enable accurate and smooth depth prediction from monocular RGB image. The key idea of our method is to exploit object-centric hierarchical relationship as contextual constraints to compensate for the regularity of spatial depth changing. In particular, we design a semantics-guided CNN network to encode the original image into a global context feature map and encode the objects’ relationship into a local relationship feature map simultaneously, so that we can leverage such effective and consolidated coding scheme over scenario samples to guide the depth prediction in a more accurate way. Benefiting from the local-to-global context constraints, our method can well respect the global depth changing and preserve the local depth details at the same time. In addition, our approach could make full use of the hierarchical semantic relationship across inner-object components and neighboring objects to define depth changing constraints. We conduct extensive experiments and make comprehensive evaluations on widely-used public datasets, and the experiments confirm that our method outperforms most state-of-the-art depth estimation methods in preserving the local details in depth.

References

[1]

C. Godard, O.M. Aodha, G.J. Brostow, Unsupervised monocular depth estimation with left-right consistency, CVPR, 2017, pp. 6602–6611.

[2]

Z. Zhang, C. Xu, J. Yang, T. Ying, L. Chen, Deep hierarchical guidance and regularization learning for end-to-end depth estimation, Pattern Recognition 83 (2018) 430–442.

[3]

I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth prediction with fully convolutional residual networks, 3D Vision, 2016, pp. 239–248.

[4]

Z. Wang, X. Ye, B. Sun, J. Yang, R. Xu, H. Li, Depth upsampling based on deep edge-aware learning, Pattern Recognit. 103 (2020) 107274,.

[5]

M. Boshtayeva, D. Hafner, J. Weickert, A focus fusion framework with anisotropic depth map smoothing, Pattern Recognition 48 (11) (2015) 3310–3323.

[6]

A.S. Malik, T.S. Choi, A novel algorithm for estimation of depth map using image focus for 3d shape recovery in the presence of noise, Pattern Recognition 41 (7) (2008) 2200–2225.

[7]

J. Fan, T.E. Weymouth, Depth from relative normal flows, Pattern Recognition 23 (9) (1990) 1011–1022.

[8]

C. Godard, O. Aodha, G. Brostow, Digging into self-supervised monocular depth estimation, ICCV, 2019.

[9]

R. Ji, L. Cao, Y. Wang, Joint depth and semantic inference from a single image via elastic conditional random field, Pattern Recognition 59 (2016) 268–281,.

Digital Library

[10]

Z. Hu, Z. Tan, Depth recovery and affine reconstruction under camera pure translation, Pattern Recognition 40 (10) (2007) 2826–2836,.

Digital Library

[11]

B. Li, Y. Dai, M. He, Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference, Pattern Recognition 83 (2018) 328–339.

[12]

F. Cheng, X. He, H. Zhang, Learning to refine depth for robust stereo estimation, Pattern Recognition 74 (2018) 122–133,.

Digital Library

[13]

M. Heo, J. Lee, K.-R. Kim, H.-U. Kim, C.-S. Kim, Monocular depth estimation using whole strip masking and reliability-based refinement, ECCV, 2018, pp. 39–55.

[14]

A. Torralba, A. Oliva, Depth estimation from image structure, IEEE Transactions on pattern analysis and machine intelligence 24 (9) (2002) 1226–1238.

[15]

L. Bai, F. Escolano, E.R. Hancock, Depth-based hypergraph complexity traces from directed line graphs, Pattern Recognition 54 (2016) 229–240,.

Digital Library

[16]

D. Eigen, C. Puhrsch, R. Fergus, Depth map prediction from a single image using a multi-scale deep network, NIPS, 2014, pp. 2366–2374.

[17]

D. Xu, W. Ouyang, X. Wang, N. Sebe, Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing, CVPR, 2018.

[18]

H. Liu, X. Tang, S. Shen, Depth-map completion for large indoor scene reconstruction, Pattern Recognition 99 (2020) 107112,.

Digital Library

[19]

L. Ma, R.C. Staunton, Integration of multiresolution image segmentation and neural networks for object depth recovery, Pattern Recognition 38 (7) (2005) 985–996,.

Digital Library

[20]

A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, End-to-end learning of geometry and context for deep stereo regression, ICCV, 2017, pp. 66–75.

[21]

H. Fu, M. Gong, C. Wang, K. Batmanghelich, D. Tao, Deep Ordinal Regression Network for Monocular Depth Estimation, CVPR, 2018, pp. 2002–2011.

[22]

D. Xu, E. Ricci, W. Ouyang, X. Wang, N. Sebe, Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation, CVPR, 2017, pp. 161–169.

[23]

J. Jiao, Y. Cao, Y. Song, R. Lau, Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss, ECCV, 2018, pp. 55–71.

[24]

Z. Zhang, Z. Cui, C. Xu, Z. Jie, X. Li, J. Yang, Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation, ECCV, 2018, pp. 238–255.

[25]

Z. Zhang, Z. Cui, C. Xu, Y. Yan, N. Sebe, J. Yang, Pattern-affinitive propagation across depth, surface normal and semantic segmentation, CVPR, 2019, pp. 4101–4110.

[26]

D. Xu, W. Ouyang, X. Wang, N. Sebe, PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing, CVPR, 2018, pp. 675–684.

[27]

A. Atapour-Abarghouei, T.P. Breckon, Veritatem Dies Aperit-Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach, CVPR, 2019, pp. 3368–3379.

[28]

R. Garg, V.K. BG, G. Carneiro, I. Reid, Unsupervised cnn for single view depth estimation: Geometry to the rescue, ECCV, 2016, pp. 740–756.

[29]

Y. Kuznietsov, J. Stückler, B. Leibe, Semi-supervised deep learning for monocular depth map prediction, CVPR, 2017, pp. 6647–6655.

[30]

S. Li, J. Shi, W. Song, A. Hao, H. Qin, Few-shot learning for monocular depth estimation based on local object relationship, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), 2019, pp. 1221–1228,.

[31]

F. Chollet, Xception: Deep learning with depthwise separable convolutions, CVPR, 2017, pp. 1800–1807.

[32]

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, ECCV, 2018, pp. 833–851.

[33]

J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, Y. Wei, Deformable convolutional networks, ICCV, 2017, pp. 764–773.

[34]

N. Silberman, D. Hoiem, P. Kohli, R. Fergus, Indoor segmentation and support inference from rgbd images, ECCV, 2012, pp. 746–760.

[35]

A. Saxena, S.H. Chung, A.Y. Ng, Learning depth from single monocular images, Advances in Neural Information Processing Systems 18, 2006, pp. 1161–1168.

[36]

A. Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: The kitti dataset, Int. J. Rob. Res. 32 (11) (2013) 1231–1237.

Digital Library

[37]

M. Liu, M. Salzmann, X. He, Discrete-continuous depth estimation from a single image, CVPR, 2014, pp. 716–723.

[38]

F. Liu, C. Shen, G. Lin, I. Reid, Learning depth from single monocular images using deep convolutional neural fields, IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (10) (2016) 2024–2039.

Digital Library

[39]

J. Jiao, Y. Cao, Y. Song, R.W.H. Lau, Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss, ECCV, 2018, pp. 55–71.

[40]

X. Yang, Y. Gao, H. Luo, C. Liao, K. Cheng, Bayesian denet: Monocular depth prediction and frame-wise fusion with synchronized uncertainty, IEEE Transactions on Multimedia 21 (11) (2019) 2701–2713.

[41]

D. Eigen, R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, ICCV, 2015, pp. 2650–2658.

[42]

A. Roy, S. Todorovic, Monocular depth estimation using neural regression forest, CVPR, 2016, pp. 5506–5514.

[43]

W. Song, S. Li, J. Liu, A. Hao, Q. Zhao, H. Qin, Contextualized cnn for scene-aware depth estimation from single rgb image, IEEE Transactions on Multimedia 22 (5) (2020) 1220–1233.

[44]

D. Xu, W. Wang, H. Tang, H. Liu, N. Sebe, E. Ricci, Structured attention guided convolutional neural fields for monocular depth estimation, CVPR, 2018, pp. 3917–3925.

[45]

F. Tosi, F. Aleotti, M. Poggi, S. Mattoccia, Learning monocular depth estimation infusing traditional stereo knowledge, CVPR, 2019, pp. 9791–9801.

[46]

S. Song, S.P. Lichtenberg, J. Xiao, Sun rgb-d: A rgb-d scene understanding benchmark suite, CVPR, 2015, pp. 567–576.

Cited By

Fu ZHong SLiu MLaga HBennamoun MBoussaid FGuo Y(2023)Multi-stage information diffusion for joint depth and surface normal estimationPattern Recognition10.1016/j.patcog.2023.109660141:COnline publication date: 5-Jun-2023
https://dl.acm.org/doi/10.1016/j.patcog.2023.109660
Li RXue DSu SHe XMao QZhu YSun JZhang Y(2023)Learning depth via leveraging semanticsPattern Recognition10.1016/j.patcog.2022.109297137:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.patcog.2022.109297
Xing HCao YBiber MZhou MBurschka D(2022)Joint prediction of monocular depth and structure using planar and parallax geometryPattern Recognition10.1016/j.patcog.2022.108806130:COnline publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.patcog.2022.108806

Index Terms

Hierarchical Object Relationship Constrained Monocular Depth Estimation.
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Depth Map Decomposition for Monocular Depth Estimation
Computer Vision – ECCV 2022
Abstract
We propose a novel algorithm for monocular depth estimation that decomposes a metric depth map into a normalized depth map and scale features. The proposed network is composed of a shared encoder and three decoders, called G-Net, N-Net, and M-Net, ...
DPDFormer: A Coarse-to-Fine Model for Monocular Depth Estimation
Monocular depth estimation attracts great attention from computer vision researchers for its convenience in acquiring environment depth information. Recently classification-based MDE methods show its promising performance and begin to act as an essential ...
Illumination Insensitive Monocular Depth Estimation Based on Scene Object Attention and Depth Map Fusion
Pattern Recognition and Computer Vision
Abstract
Monocular depth estimation (MDE) is a crucial but challenging computer vision (CV) task which suffers from lighting sensitivity, blurring of neighboring depth edges, and object omissions. To address these problems, we propose an illumination ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 120, Issue C

Dec 2021

799 pages

ISSN:0031-3203

Issue’s Table of Contents

Copyright © 2021.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 December 2021

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fu ZHong SLiu MLaga HBennamoun MBoussaid FGuo Y(2023)Multi-stage information diffusion for joint depth and surface normal estimationPattern Recognition10.1016/j.patcog.2023.109660141:COnline publication date: 5-Jun-2023
https://dl.acm.org/doi/10.1016/j.patcog.2023.109660
Li RXue DSu SHe XMao QZhu YSun JZhang Y(2023)Learning depth via leveraging semanticsPattern Recognition10.1016/j.patcog.2022.109297137:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.patcog.2022.109297
Xing HCao YBiber MZhou MBurschka D(2022)Joint prediction of monocular depth and structure using planar and parallax geometryPattern Recognition10.1016/j.patcog.2022.108806130:COnline publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.patcog.2022.108806

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents