IAGC: Interactive Attention Graph Convolution Network for Semantic Segmentation of Point Clouds in Building Indoor Environment
Abstract
:1. Introduction
- The high-dimensional features based on regional points, which can distinguish different objects with similar features but distinctive positions, are not fully utilized in designed convolution kernels.
- The contextual information derived from the fully-connected graph structure for all points not only reduces the efficiency but adversely impacts the generalization of global interaction.
- A dual cross-attention Transformer variant, called IAG–MLP, is proposed to be directly oriented to superpoints that are reorganized from raw point clouds into geometry-based and color-based homogeneous segments, and it will enhance the capability of capturing high-dimensional contextual dependencies in local embeddings by learning both cross-position attention and cross-channel attention.
- By propagating contextual messages via nearby superpoints and related super-edges, an end-to-end graph network is built to gradually update the feature embeddings of superpoints, finally translating superpoint-level semantic inference into point-level fine-grained inference.
- We present theoretical and empirical analyses of the proposed IAGC architecture, as well as qualitative and quantitative experiments in three indoor benchmarks that indicate its effectiveness and remarkable performance.
2. Related Work
2.1. Semantic Segmentation for Point Clouds
2.2. Deep Learning Networks for Semantic Segmentation
2.3. Attentive-Based Transformer Networks
2.4. Graph Convolutions
3. Methodology
3.1. Oversegmented Superpoint Generation
3.2. IAG–MLP
- (1)
- Contraction Operation. To enable cross-channel interaction, it is necessary to contain a contraction operation over the entire feature dimension. The succinct way is by applying a linear projection coupled with an anterior normalization function and a posterior activation function, which can be formulated as:
- (2)
- Cross-position Attention. Self-attention is normally treated as a linear projection algorithm utilizing self-values of data samples to ameliorate their own features, but this self-attention matrix can only explain the interrelationship among points in the same training dataset, and it is not clear whether there is a specific correlation among data samples in a scene. Additionally, despite the small magnitude of parameters involved in the self-attention module, pairwise attention calculation cannot be ignored. Thus, we designed a spatial interactive attention unit, inspired by the external attention network [55], to calculate cross-position attention between high-dimensional features and an external memory unit, which is independent of the input feature and shares information across the entire training dataset. Specifically, we construct the spatial interaction unit with the paradigmatic structure of the self-attention layer of the vanilla Transformer and initially normalize it in a similar way to PCT [38] with the double normalization method, which empirically improves the stability of local embedding networks.
- (3)
- Cross-channel Attention. Unlike the self-attention weights derived from pairwise attention among points, the cross-channel attention unit can be viewed as a cross-attention mechanism to modulate individual point representation using spatial signal. Specifically, the cross-channel attention map is inferred from the dot production of and the cross-position attentional weight matrix :
- (4)
- Residual Connection Block. Theoretically, a deep learning network with more variables should be better able to do challenging tasks, but it is proven that simply deepening the layers makes it harder to train the network, which is called the degeneration problem. Hence, considering the degeneration problem of a deep learning network with increasing layers as it stacks more IAG–MLP modules, the residual connection block is put forward to create a concise shortcut where the projected input is put into the IAG block and passed through several layers to be finally integrated with the projected attentional map.
3.3. Interactive Attention Graph Convolution Network (IAGC)
4. Experiments
4.1. Datasets
- S3DIS [58]
- 2.
- ScanNet (V2) [59]
- 3.
- SceneNN [60]
4.2. Implementation Details
4.3. Ablation Studies and Analyses
4.3.1. Ablation Test of Local Embedding Function
Method | Params | Gflops | OA | mIoU | mAcc | Method | Params | Gflops | OA | mIoU | mAcc |
---|---|---|---|---|---|---|---|---|---|---|---|
PoinNet [13] | 189k | 10.7 | 64.13 | 6.68 | 7.87 | PointNet + GRU | 289k | 11.6 | 70.39 | 12.94 | 13.74 |
Transformer [19] | 170k | 16.6 | 61.88 | 6.71 | 7.29 | Transformer + GRU | 270k | 23.1 | 69.38 | 14.16 | 19.90 |
1-IAG–MLP | 87k | 8.5 | 62.62 | 6.96 | 8.82 | 1-IAG–MLP + GRU | 186k | 10.2 | 71.44 | 12.77 | 16.73 |
2-IAG–MLP | 153k | 16.2 | 62.94 | 7.57 | 8.18 | 2-IAG–MLP + GRU | 253k | 17.5 | 73.03 | 16.05 | 17.73 |
3-IAG–MLP | 219k | 24.8 | 61.94 | 6.94 | 8.18 | 3-IAG–MLP + GRU | 319k | 27.0 | 71.00 | 14.14 | 18.35 |
4-IAG–MLP | 286k | 33.8 | 62.95 | 6.59 | 7.94 | 4-IAG–MLP + GRU | 385K | 34.1 | 70.83 | 14.46 | 18.60 |
5-IAG–MLP | 352k | 41.2 | 62.42 | 6.91 | 8.95 | 5-IAG–MLP + GRU | 451K | 43.4 | 71.55 | 14.04 | 16.60 |
2-IAG–MLP + LSTM | 255k | 17.6 | 69.84 | 13.61 | 17.66 | ||||||
2-IAG–MLP + GAT | 190k | 18.9 | 50.35 | 7.04 | 7.83 |
4.3.2. Ablation Test of Global Aggregation Function
4.3.3. Ablation Test of Granularity of Superpoint Graph
4.4. Segmentation Results
4.4.1. Results on the S3DIS Dataset
4.4.2. Results on the ScanNet Dataset
Category | Method | mIoU | Bath | Bed | Shelf | Cab | Chair | Cntr | Curt | Desk | Door | Floor | Other | Pic | Fridg | Show | Sink | Sofa | Table | Toil | Wall | Wind |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2D + 3D | 3DMV [63] | 48.4 | 48.4 | 53.8 | 64.3 | 42.4 | 60.6 | 31.0 | 57.4 | 43.3 | 37.8 | 79.6 | 30.1 | 21.4 | 53.7 | 20.8 | 47.2 | 50.7 | 41.3 | 69.3 | 60.2 | 53.9 |
PFCNN [64] | 44.2 | 50.5 | 62.2 | 38.0 | 34.2 | 65.4 | 22.7 | 39.7 | 36.7 | 27.6 | 92.4 | 24 | 19.8 | 35.9 | 26.2 | 36.6 | 58.1 | 43.5 | 64 | 66.8 | 39.8 | |
Tangent Conv [65] | 43.8 | 43.7 | 64.6 | 47.4 | 36.9 | 64.5 | 35.3 | 25.8 | 28.2 | 27.9 | 91.8 | 29.8 | 14.7 | 28.3 | 29.4 | 48.7 | 56.2 | 42.7 | 61.9 | 63.3 | 35.2 | |
Point Conv | PointNet++ [14] | 33.9 | 58.4 | 47.8 | 45.8 | 25.6 | 36.0 | 25.0 | 24.7 | 27.8 | 26.1 | 67.7 | 18.3 | 11.7 | 21.2 | 14.5 | 36.4 | 34.6 | 23.2 | 54.8 | 52.3 | 25.2 |
FCPN [66] | 44.7 | 67.9 | 60.4 | 57.8 | 38.0 | 68.2 | 29.1 | 10.6 | 48.3 | 25.8 | 92.0 | 25.8 | 2.5 | 23.1 | 32.5 | 48.0 | 56 | 46.3 | 72.5 | 66.6 | 23.1 | |
PointCNN [32] | 45.8 | 57.7 | 61.1 | 35.6 | 32.1 | 71.5 | 29.9 | 37.6 | 32.8 | 31.9 | 94.4 | 28.5 | 16.4 | 21.6 | 22.9 | 48.4 | 54.5 | 45.6 | 75.5 | 70.9 | 47.5 | |
ScanNet [59] | 30.6 | 20.3 | 36.6 | 50.1 | 31.1 | 52.4 | 21.1 | 0.2 | 34.2 | 18.9 | 78.6 | 14.5 | 10.2 | 24.5 | 15.2 | 31.8 | 34.8 | 30 | 46.0 | 43.7 | 18.2 | |
Graph Conv | SPG [16] | 45.5 | 54.1 | 60.2 | 60.1 | 42.3 | 74.8 | 26.9 | 24.7 | 44.8 | 28.3 | 90.7 | 32.2 | 4.1 | 23.6 | 41.5 | 30.9 | 63.5 | 48.1 | 61.0 | 69.7 | 29.4 |
2-IAG–MLP + GRU | 53.8 | 49.5 | 69.3 | 64.7 | 47.1 | 79.3 | 30 | 47.7 | 50.5 | 35.8 | 90.3 | 32.7 | 8.1 | 47.2 | 52.9 | 44.8 | 71.0 | 50.9 | 74.6 | 73.7 | 55.4 | |
3-IAG–MLP + GRU | 50.4 | 55.6 | 63.6 | 61.4 | 47.7 | 75.7 | 23.3 | 41.9 | 44 | 36.5 | 91.6 | 31.5 | 0.1 | 33.9 | 50.9 | 44.3 | 64.1 | 49.7 | 72.7 | 71.9 | 46.6 |
4.4.3. Results Analysis
5. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tang, P.; Huber, D.; Akinci, B.; Lipman, R.; Lytle, A. Automatic reconstruction of as-built building information models from laser-scanned point clouds: A review of related techniques. Autom. Constr. 2010, 19, 829–843. [Google Scholar] [CrossRef]
- Pintore, G.; Mura, C.; Ganovelli, F.; Fuentes-Perez, L.; Pajarola, R.; Gobbetti, E. State-of-the-art in Automatic 3D Reconstruction of Structured Indoor Environments. Comput. Graph. Forum 2020, 39, 667–699. [Google Scholar] [CrossRef]
- Xia, S.; Chen, D.; Wang, R.; Li, J.; Zhang, X. Geometric primitives in LiDAR point clouds: A review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 685–707. [Google Scholar] [CrossRef]
- Lalonde, J.F.; Vandapel, N.; Huber, D.F.; Hebert, M. Natural terrain classification using three-dimensional ladar data for ground robot mobility. J. Field Robot. 2006, 23, 839–861. [Google Scholar] [CrossRef]
- Golovinskiy, A.; Kim, V.G.; Funkhouser, T. Shape-based recognition of 3D point clouds in urban environments. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2154–2161. [Google Scholar]
- Guo, Y.; Sohel, F.; Bennamoun, M.; Lu, M.; Wan, J. Rotational projection statistics for 3D local surface description and object recognition. Int. J. Comput. Vis. 2013, 105, 63–86. [Google Scholar] [CrossRef] [Green Version]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Yin, W.; Kann, K.; Yu, M.; Schütze, H. Comparative study of CNN and RNN for natural language processing. arXiv 2017, arXiv:1702.01923. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Bello, S.A.; Yu, S.; Wang, C. Review: Deep learning on 3D point clouds. Remote Sens. 2020, 12, 1729. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4338–4364. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Kaichun, M.; Juibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE Computer Society: Los Alamitos, CA, USA, 2017. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5099–5108. [Google Scholar]
- Wang, C.; Samari, B.; Siddiqi, K. Local spectral graph convolution for point set feature learning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 52–66. [Google Scholar]
- Landrieu, L.; Simonovsky, M. Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Yang, J.; Zhang, Q.; Ni, B.; Li, L.; Liu, J.; Zhou, M.; Tian, Q. Modeling point clouds with self-attention and gumbel subset sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3323–3332. [Google Scholar]
- Wang, L.; Huang, Y.; Hou, Y.; Zhang, S.; Shan, J. Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10296–10305. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
- Guinard, S.; Landrieu, L. Weakly supervised segmentation-aided classification of urban scenes from 3D LiDAR point clouds. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, XLII-1/W1, 151–157. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Dai, Z.; So, D.R.; Le, Q.V. Pay Attention to MLPs. arXiv 2021, arXiv:2105.08050. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Xiao, G.; Wang, H.; Lai, T.; Suter, D. Hypergraph modelling for geometric model fitting. Pattern Recognit. 2016, 60, 748–760. [Google Scholar] [CrossRef] [Green Version]
- Truong, Q.H. Knowledge-Based 3D Point Clouds Processing. Ph.D. Thesis, Université de Bourgogne, Dijon, France, 2013. [Google Scholar]
- Ponciano, J.J.; Roetner, M.; Reiterer, A.; Boochs, F. Object Semantic Segmentation in Point Clouds—Comparison of a Deep Learning and a Knowledge-Based Method. ISPRS Int. J. Geo-Inf. 2021, 10, 256. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Niebner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and Multi-View CNNs for Object Classification on 3D Data. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Feng, Y.; Zhang, Z.; Zhao, X.; Ji, R.; Gao, Y. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Su, H.; Jampani, V.; Sun, D.; Maji, S.; Kalogerakis, E.; Yang, M.-H.; Kautz, J. SPLATNet: Sparse Lattice Networks for Point Cloud Processing. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Du, J.; Jiang, Z.; Huang, S.; Wang, Z.; Su, J.; Su, S.; Wu, Y.; Ca, G. Point cloud semantic segmentation network based on multi-scale feature fusion. Sensors 2021, 21, 1625. [Google Scholar] [CrossRef]
- Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C. PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation. arXiv 2018, arXiv:1807.00652. [Google Scholar]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. Adv. Neural Inf. Proc. Syst. 2018, 31, 820–830. [Google Scholar]
- Lin, Y.; Wang, C.; Zhai, D.; Li, W.; Li, J. Toward better boundary preserved supervoxel segmentation for 3D point clouds. ISPRS J. Photogramm. Remote Sens. 2018, 143, 39–47. [Google Scholar] [CrossRef]
- Hui, L.; Yuan, J.; Cheng, M.; Xie, J.; Zhang, X.; Yang, J. Superpoint network for point cloud oversegmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Cheng, M.; Hui, L.; Xie, J.; Yang, J.; Kong, H. Cascaded Non-Local Neural Network for Point Cloud Semantic Segmentation. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 8447–8452. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.; Koltun, V. Point transformer. arXiv 2020, arXiv:2012.09164. [Google Scholar]
- Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.; Martin, R.; Hu, S. PCT: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
- Pan, X.; Xia, Z.; Song, S.; Li, L.; Huang, G. 3d object detection with pointformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7463–7472. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
- Wang, X.; He, J.; Ma, L. Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–15 December 2019; pp. 4573–4583. [Google Scholar]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Z.; Cui, P.; Zhu, W. Deep Learning on Graphs: A Survey. IEEE Trans. Knowl. Data Eng. 2020, 34, 249–270. [Google Scholar] [CrossRef] [Green Version]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Proc. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef] [Green Version]
- Zhiheng, K.; Ning, L. PyramNet: Point cloud pyramid attention network and graph embedding module for classification and segmentation. arXiv 2019, arXiv:1906.03299. [Google Scholar]
- Luo, H.; Chen, C.; Fang, L.; Khoshelham, K.; Shen, G. Ms-rrfsegnet:f Multiscale regional relation feature segmentation network for semantic segmentation of urban scene point clouds. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8301–8315. [Google Scholar] [CrossRef]
- Demantké, J.; Mallet, C.; David, N.; Vallet, B. Dimensionality based scale selection in 3D lidar point clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2011, 38, 97–102. [Google Scholar] [CrossRef] [Green Version]
- Landrieu, L.; Obozinski, G. Cut pursuit: Fast algorithms to learn piecewise constant functions on general weighted graphs. SIAM J. Imaging Sci. 2017, 10, 1724–1766. [Google Scholar] [CrossRef] [Green Version]
- Santurkar, S.; Tsipras, D.; Ilyas, A.; Dry, A. How does batch normalization help optimization? In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 2488–2498. [Google Scholar]
- Agarap, A.F. Deep learning using rectified linear units (relu). arXiv 2018, arXiv:1803.08375. [Google Scholar]
- Shazeer, N. Glu variants improve transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. Int. Conf. Mach. Learn. PMLR 2017, 70, 933–941. [Google Scholar]
- Guo, M.H.; Liu, Z.N.; Mu, T.J.; Hu, S.M. Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks. In Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), A Virtual Event. 19 June 2021. [Google Scholar]
- Simonovsky, M.; Komodakis, N. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 3693–3702. [Google Scholar]
- Jia, X.; De Brabandere, B.; Tuytelaars, T.; Gool, L. V Dynamic filter networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 667–675. [Google Scholar]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1534–1543. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Niessner, M. Scannet: Richly-annotated 3D reconstructions of indoor scenes. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 June 2017; pp. 2432–2443. [Google Scholar]
- Hua, B.S.; Pham, Q.H.; Nguyen, D.T.; Tran, M.K.; Yu, L.F.; Yeung, S.K. Scenenn: A Scene Meshes Dataset with annotations. In Proceedings of the International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor Segmentation and Support Inference from RGBD Images. In Computer Vision—ECCV 2012; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Dai, A.; Nießner, M. 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 452–468. [Google Scholar]
- Yang, Y.; Liu, S.; Pan, H.; Liu, Y.; Tong, X. PFCNN: Convolutional neural networks on 3d surfaces using parallel frames. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 13578–13587. [Google Scholar]
- Tatarchenko, M.; Park, J.; Koltun, V.; Zhou, Q.Y. Tangent convolutions for dense prediction in 3d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3887–3896. [Google Scholar]
- Rethage, D.; Wald, J.; Sturm, J.; Navab, N.; Tombari, F. Fully-convolutional point networks for large-scale point clouds. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 596–611. [Google Scholar]
Method | Area 4-fold | Miro-Mean over 6-fold | ||||
---|---|---|---|---|---|---|
OA | mIoU | mAcc | OA | mIoU | mAcc | |
Transformer + GRU | 82.8 | 51.9 | 61.8 | 83.4 | 57.3 | 69.1 |
PointNet + GRU | 82.5 | 52.4 | 63.3 | 84.6 | 58.6 | 70.2 |
IAGC(2-IAG–MLP + GRU) | 85.6 | 65.2 | 78.3 | 85.6 | 64.7 | 76.8 |
IAGC(3-IAG–MLP + GRU) | 83.5 | 54.5 | 65.0 | 84.7 | 58.8 | 70.8 |
OA | mIoU | mAcc | |||
---|---|---|---|---|---|
0.02 | 0.03 | 512 | 79.2 | 51.0 | 75.4 |
1024 | 76.7 | 44.9 | 72.0 | ||
0.05 | 512 | 74.8 | 39.3 | 71.7 | |
1024 | 74.8 | 37.8 | 70.1 | ||
0.03 | 0.03 | 512 | 77.1 | 46.9 | 74.1 |
1024 | 69.2 | 30.0 | 62.5 | ||
0.05 | 1024 | 73.8 | 35.7 | 70.8 |
Method | OA | mIoU | Ceiling | Floor | Wall | Column | Bookcase | Beam | Wind | Door | Tab | Chair | Sofa | Board | Clutter |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PointNet [13] | 74.0 | 44.7 | 87.2 | 92.1 | 61.9 | 12.9 | 39.9 | 28.8 | 41.1 | 48.3 | 47.2 | 44.3 | 14.4 | 28.1 | 35.2 |
GAC [18] | 79.4 | 46.8 | 90.1 | 86.5 | 67.4 | 12.7 | 34.1 | 17.2 | 44.5 | 51.2 | 58.5 | 54.8 | 16.1 | 25.9 | 49.5 |
PointNet++ [14] | 81.1 | 56.9 | 91.7 | 92.4 | 71.3 | 15.6 | 51.5 | 29.6 | 56.0 | 57.5 | 62.2 | 65.3 | 45.1 | 51.8 | 50.0 |
SPG [16] | 84.6 | 58.6 | 90.9 | 95.2 | 74.3 | 35.7 | 59.8 | 41.4 | 48.6 | 60.3 | 66.3 | 74.9 | 49.0 | 12.5 | 52.9 |
IAGC(Ours) | 85.6 | 64.7 | 93.2 | 95.4 | 73.3 | 39.8 | 67.0 | 53.7 | 64.6 | 61.0 | 74.2 | 79.4 | 60.4 | 20.9 | 58.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhai, R.; Zou, J.; He, Y.; Meng, L. IAGC: Interactive Attention Graph Convolution Network for Semantic Segmentation of Point Clouds in Building Indoor Environment. ISPRS Int. J. Geo-Inf. 2022, 11, 181. https://doi.org/10.3390/ijgi11030181
Zhai R, Zou J, He Y, Meng L. IAGC: Interactive Attention Graph Convolution Network for Semantic Segmentation of Point Clouds in Building Indoor Environment. ISPRS International Journal of Geo-Information. 2022; 11(3):181. https://doi.org/10.3390/ijgi11030181
Chicago/Turabian StyleZhai, Ruoming, Jingui Zou, Yifeng He, and Liyuan Meng. 2022. "IAGC: Interactive Attention Graph Convolution Network for Semantic Segmentation of Point Clouds in Building Indoor Environment" ISPRS International Journal of Geo-Information 11, no. 3: 181. https://doi.org/10.3390/ijgi11030181