C2S-RoadNet: Road Extraction Model with Depth-Wise Separable Convolution and Self-Attention
Abstract
:1. Introduction
- By combining depth-wise separable convolution residual unit and self-attention, the authors propose a deep separable residual asymmetric self-attention network for road extraction tasks, which can obtain more complete road information in complex scenes;
- Lightweight asymmetric self-attention effectively reduces the computational complexity of self-attention while effectively utilizing global information, enhancing the model’s robustness to image flipping and rotation, and improving the extraction effect of roads under occlusion;
- The use of a multi-scale module with adaptive weights can fuse features at different scales, utilizing trainable parameters to fully fuse the available information of different features;
- C2S-RoadNet is evaluated by using two challenging road datasets. It achieves better performance than other state-of-the-art road extraction methods.
2. Materials and Methods
2.1. Lightweight Asymmetric Self-Attention Module
2.1.1. Depth-Wise Separable Convolution
2.1.2. Asymmetric Convolution Blocks
2.1.3. Self-Attentive Mechanism
2.2. Multi-Scale Adaptive Weighting Module
- (1)
- Early fusion: It is a classic feature fusion method. Existing networks (such as VGG19) use concat or add to fuse some layers;
- (2)
- Late fusion: This adopts a similar idea as feature pyramid network (FPN) and makes predictions after feature fusion;
- (3)
- One method uses a network with high and low feature fusion capabilities to replace ordinary networks, such as Densenet;
- (4)
- The last method adjusts the prediction results based on the prediction of high-layer features.
2.3. Asymmetric Convolutional Neural Network with Adaptive Multiscale Feature Fusion
3. Experimental Data and Preprocessing
3.1. Datasets and Pre-Processing
3.1.1. Datasets
3.1.2. Pre-Processing
- Image cropping: The original images were cropped using a sliding window technique, with a pixel value of 256 for the sliding window and a stride of 256. There was no overlap between adjacent windows. If the length and width of the large images in the dataset could not be divided evenly by 256, zero padding was used to make them multiples of 256 before sliding and cutting. After cropping, the Massachusetts Road dataset contained a total of 39,888 training images of size 256 and 504 validation images. The DeepGlobe training dataset, after cropping, yielded 83,168 images. The images obtained after cropping were divided into training and validation sets at a ratio of 8 to 2, resulting in a total of 66,532 training images and 16,634 validation images. The LSRV dataset includes three regions, each of which was cropped using a 1024 sliding window on three large images. If the images could not be divided evenly by 1024, zero padding was used to make them multiples of 1024 before sliding and cutting. The Boston area yielded 529 images of size 1024. The Birmingham area yielded 484 images of size 1024. The Shanghai area yielded 289 images of size 1024. The three regions were divided into training, validation, and test sets. The training set accounts for 70% of the total, the validation set accounts for 20%, and the test set accounts for 10%. The training and validation sets obtained from the three regions were each organized into a total training set and a total validation set. The training and validation sets were then cropped using a 256 sliding window, ultimately yielding a total of 14,584 training images of sizes 256 and 4167 validation images of size 256.
- Image augmentation: Image augmentation can effectively avoid overfitting, improve the robustness of the model, improve the generalization ability of the model, and avoid the problem of sample imbalance. Therefore, this paper adopts the following data enhancement methods to expand the dataset:
- Random flipping: There are three main options, namely horizontal, vertical, and both horizontal and vertical flipping;
- Random rotation of 90°, where ranges from 0 to 4;
- Center rotation: This method rotates the image by a certain angle with the center of the image as the rotation point;
- Translation: The image is shifted a certain distance in the vertical and horizontal directions;
- Random scaling: If the scaling size is larger than the original size, the image is randomly cropped to the original size; if the scaling size is smaller than the original size, the image is extended to the original size by mirror padding;
- Cutout: In addition to solving the occlusion problem, cutout is also inspired by dropout. As is well known, dropout randomly hides some neurons, and the final network model is equivalent to an ensemble of multiple models.
3.2. Model Evaluation Criteria
3.3. Experimental Settings
4. Experimental Results
4.1. Experimental Results on the Massachusetts Roads Dataset
4.2. Experimental Results on the DeepGlobe Dataset
4.3. Experimental Results on the LSRV Dataset
5. Discussion
5.1. Comparative Test Discussion
5.1.1. Analysis of the Extraction Results of Massachusetts Road Dataset
5.1.2. Analysis of the Extraction Results of DeepGlobe Road Dataset
5.1.3. Analysis of the Extraction Results of LSRV Road Dataset
5.2. Ablation Test Analysis
5.3. Integrated Summary
- Superior Performance: The C2S-RoadNet model exhibits significant advantages in performance across diverse test scenarios and datasets. This assertion is substantiated by both quantitative metrics and qualitative analyses;
- Module Efficacy: Our ablation studies confirm that the DS2C and ADASPP modules are instrumental in enhancing the model’s performance. Their inclusion endows the model with increased flexibility and accuracy when navigating complex scenarios and varying road conditions;
- Holistic Utility: The newly incorporated modules demonstrate exemplary performance not only in quantitative metrics but also in qualitative evaluations, such as result visualization.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, W.; Yang, N.; Zhang, Y.; Wang, F.; Cao, T.; Eklund, P. A review of road extraction from remote sensing images. J. Traffic Transp. Eng. 2016, 3, 271–282. [Google Scholar] [CrossRef]
- Schubert, H.; van de Gronde, J.J.; Roerdink, J.B. Efficient computation of greyscale path openings. Math. Morphol.-Theory Appl. 2016, 1, 189–202. [Google Scholar] [CrossRef]
- Hu, J.; Razdan, A.; Femiani, J.C.; Cui, M.; Wonka, P. Road Network Extraction and Intersection Detection from Aerial Images by Tracking Road Footprints. IEEE Trans. Geosci. Remote Sens. 2007, 45, 4144–4157. [Google Scholar] [CrossRef]
- Jing, R.; Gong, Z.; Zhu, W.; Guan, H.; Zhao, W. Island Road Centerline Extraction Based on a Multiscale United Feature. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3940–3953. [Google Scholar] [CrossRef]
- Das, S.; Mirnalinee, T.T.; Varghese, K. Use of Salient Features for the Design of a Multistage Framework to Extract Roads From High Resolution Multispectral Satellite Images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3906–3931. [Google Scholar] [CrossRef]
- Li, J.; Hu, Q.; Ai, M. Unsupervised road extraction via a Gaussian 56 mixture model with object-based features. Int. J. Remote Sens. 2018, 39, 2421–2440. [Google Scholar] [CrossRef]
- Gao, L.; Song, W.; Dai, J.; Chen, Y. Road Extraction from High-Resolution Remote Sensing Imagery Using Refined Deep Residual Convolutional Neural Network. Remote Sens. 2019, 11, 552. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, L.; Wang, C.; Zhuo, L.; Tian, Q.; Liang, X. Road Recognition from Remote Sensing Imagery Using Incremental Learning. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2993–3005. [Google Scholar] [CrossRef]
- Saito, S.; Yamashita, T.; Aoki, Y. Multiple object extraction from aerial imagery with convolutional neural networks. Electron. Imag. 2016, 60, 1–9. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation 2015. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2016, arXiv:1511.07122. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Se-mantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Peng, X.; Yin, Z.; Yang, Z. Deeplab_v3_plus-net for Image Semantic Segmentation with Channel Compression. In Proceedings of the 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020. [Google Scholar]
- Yuan, Y.; Huang, L.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Ocnet: Object context network for scene parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Yang, Z.; Zhou, D.; Yang, Y.; Zhang, J.; Chen, Z. TransRoadNet: A Novel Road Extraction Method for Remote Sensing Images via Combining High-Level Semantic Feature and Context. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Alshaikhli, T.; Liu, W.; Maruyama, Y. Simultaneous Extraction of Road and Centerline from Aerial Images Using a Deep Convolutional Neural Network. ISPRS Int. J. Geo-Inf. 2021, 10, 147. [Google Scholar] [CrossRef]
- Cheng, G.; Wang, Y.; Xu, S.; Wang, H.; Xiang, S.; Pan, C. Automatic Road Detection and Centerline Extraction via Cascaded End-to-End Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3322–3337. [Google Scholar] [CrossRef]
- Zhu, Q.; Zhang, Y.; Wang, L.; Zhong, Y.; Guan, Q.; Lu, X.; Zhang, L.; Li, D. A Global Context-aware and Batchin dependent Network for road extraction from VHR satellite imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 353–365. [Google Scholar] [CrossRef]
- Li, P.; Tian, Z.; He, X.; Qiao, M.; Cheng, X.; Song, D.; Chen, M.; Li, J.; Zhou, T.; Guo, X.; et al. LR-RoadNet: A long-range context-aware neural network for road extraction via high resolution remote sensing images. IET Image Process. 2021, 15, 3239–3253. [Google Scholar] [CrossRef]
- Shao, Z.; Zhou, Z.; Huang, X.; Zhang, Y. MRENet: Simultaneous Extraction of Road Surface and Road Centerline in Complex Urban Scenes from Very High Resolution Images. Remote Sens. 2021, 13, 239. [Google Scholar] [CrossRef]
- Rong, Y.; Zhuang, Z.; He, Z.; Wang, X. A Maritime Traffic Network Mining Method Based on Massive Trajectory Data. Electronics 2022, 11, 987. [Google Scholar] [CrossRef]
- Li, J.; Liu, Y.; Zhang, Y.; Zhang, Y. Cascaded Attention DenseUNet (CADUNet) for Road Extraction from Very-High-Resolution Images. ISPRS Int. J. Geo-Inf. 2021, 10, 329. [Google Scholar] [CrossRef]
- Panboonyuen, T.; Jitkajornwanich, K.; Lawawirojwong, S.; Srestasathiern, P.; Vateekul, P. Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images. Remote Sens. 2021, 13, 5100. [Google Scholar] [CrossRef]
- Xu, Z.; Zhang, W.; Zhang, T.; Yang, Z.; Li, J. Efficient Transformer for Remote Sensing Image Segmentation. Remote. Sens. 2021, 13, 3585. [Google Scholar] [CrossRef]
- Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Wang, H.; Chen, X.; Zhang, T.; Xu, Z.; Li, J. CCTNet: Coupled CNN and Transformer Network for Crop Segmentation of Remote Sensing Images. Remote. Sens. 2022, 14, 1956. [Google Scholar] [CrossRef]
- Gao, L.; Liu, H.; Yang, M.; Chen, L.; Wan, Y.; Xiao, Z.; Qian, Y. STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10990–11003. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Honolulu, HI, USA, 2017; pp. 1800–1807. [Google Scholar]
- Xu, G.; Li, J.; Gao, G.; Lu, H.; Yang, J.; Yue, D. Lightweight Real-time Semantic Segmentation Network with Efficient Transformer and CNN. arXiv 2023, arXiv:2302.10484. [Google Scholar] [CrossRef]
- Panboonyuen, T.; Jitkajornwanich, K.; Lawawirojwong, S.; Srestasathiern, P.; Vateekul, P. Road Segmentation of Remotely-Sensed Images Using Deep Convolutional Neural Networks with Landscape Metrics and Conditional Random Fields. Remote Sens. 2017, 9, 680. [Google Scholar] [CrossRef]
- Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. Deep Globe 2018: A Challenge to Parse the Earth through Satellite Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Lu, X.; Zhong, Y.; Zheng, Z.; Zhang, L. GAMSNet: Globally aware road detection network with multi-scale residual learning. ISPRS J. Photogramm. Remote Sens. 2021, 175, 340–352. [Google Scholar] [CrossRef]
- Cheng, J.; Tian, S.; Yu, L.; Liu, S.; Wang, C.; Ren, Y.; Lu, H.; Zhu, M. DDU-Net: A Dual Dense U-Structure Network for Medi-cal Image Segmentation. Appl. Soft Comput. 2022, 126, 109297. [Google Scholar] [CrossRef]
Image | Region | Range (pixels) | Resolution (m/pixel) | Area (km2) |
---|---|---|---|---|
1 | Boston and its surroundings, in the USA | 23,104 × 23,552 | 0.4411 | 105.87 |
2 | Birmingham, in the UK | 22,272 × 22,464 | 0.3640 | 66.29 |
3 | Shanghai, in China | 16,768 × 16,640 | 0.5109 | 72.83 |
Dataset | Image Resolution | Training Dataset | Validation Dataset |
---|---|---|---|
Massachusetts Road dataset | 256 256 | 39,888 | 504 |
DeepGlobe dataset | 256 256 | 66,532 | 16,634 |
LSRV dataset | 256 256 | 14,584 | 4167 |
Program | Massachusetts Road Dataset | DeepGlobe Dataset | LSRV Dataset | ||||||
---|---|---|---|---|---|---|---|---|---|
PA | F1 | IoU | PA | F1 | IoU | PA | F1 | IoU | |
U-Net | 0.9232 | 0.7012 | 0.6638 | 0.9537 | 0.7431 | 0.5810 | 0.9211 | 0.7542 | 0.6320 |
AC Net | 0.9636 | 0.7409 | 0.6841 | 0.9703 | 0.7523 | 0.5997 | 0.9498 | 0.8134 | 0.6842 |
Deeplabv3+ | 0.9647 | 0.7427 | 0.6935 | 0.9734 | 0.7626 | 0.6234 | 0.9593 | 0.8197 | 0.6997 |
DDU-Net | 0.9703 | 0.7542 | 0.7067 | 0.9782 | 0.7762 | 0.6327 | 0.9627 | 0.8221 | 0.7083 |
Ours | 0.9813 | 0.7732 | 0.7218 | 0.9816 | 0.7828 | 0.6398 | 0.9601 | 0.8235 | 0.7131 |
Program | Massachusetts Road Dataset | DeepGlobe Dataset | ||||
---|---|---|---|---|---|---|
PA | F1 | IoU | PA | F1 | IoU | |
Baseline | 0.9232 | 0.7012 | 0.6638 | 0.9537 | 0.7431 | 0.5810 |
Model A | 0.9623 | 0.7503 | 0.6842 | 0.9714 | 0.7616 | 0.6215 |
Model B | 0.9698 | 0.7567 | 0.7015 | 0.9726 | 0.7761 | 0.6237 |
Model C (ours) | 0.9813 | 0.7732 | 0.7218 | 0.9816 | 0.7828 | 0.6398 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, A.; Ren, C.; Yan, Z.; Xue, X.; Zhou, Y.; Liu, Y.; Lu, J.; Ding, C. C2S-RoadNet: Road Extraction Model with Depth-Wise Separable Convolution and Self-Attention. Remote Sens. 2023, 15, 4531. https://doi.org/10.3390/rs15184531
Yin A, Ren C, Yan Z, Xue X, Zhou Y, Liu Y, Lu J, Ding C. C2S-RoadNet: Road Extraction Model with Depth-Wise Separable Convolution and Self-Attention. Remote Sensing. 2023; 15(18):4531. https://doi.org/10.3390/rs15184531
Chicago/Turabian StyleYin, Anchao, Chao Ren, Zhiheng Yan, Xiaoqin Xue, Ying Zhou, Yuanyuan Liu, Jiakai Lu, and Cong Ding. 2023. "C2S-RoadNet: Road Extraction Model with Depth-Wise Separable Convolution and Self-Attention" Remote Sensing 15, no. 18: 4531. https://doi.org/10.3390/rs15184531
APA StyleYin, A., Ren, C., Yan, Z., Xue, X., Zhou, Y., Liu, Y., Lu, J., & Ding, C. (2023). C2S-RoadNet: Road Extraction Model with Depth-Wise Separable Convolution and Self-Attention. Remote Sensing, 15(18), 4531. https://doi.org/10.3390/rs15184531