skip to main content
10.1145/3423323.3423407acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

PP-LinkNet: Improving Semantic Segmentation of High Resolution Satellite Imagery with Multi-stage Training

Published: 12 October 2020 Publication History

Abstract

Road network and building footprint extraction is essential for many applications such as updating maps, traffic regulations, city planning, ride-hailing, disaster response etc. Mapping road networks is currently both expensive and labor-intensive. Recently, improvements in image segmentation through the application of deep neural networks has shown promising results in extracting road segments from large scale, high resolution satellite imagery. However, significant challenges remain due to lack of enough labeled training data needed to build models for industry grade applications. In this paper, we propose a two-stage transfer learning technique to improve robustness of semantic segmentation for satellite images that leverages noisy pseudo ground truth masks obtained automatically (without human labor) from crowd-sourced OpenStreetMap (OSM) data. We further propose Pyramid Pooling-LinkNet (PP-LinkNet), an improved deep neural network for segmentation that uses focal loss, poly learning rate, and context module. We demonstrate the strengths of our approach through evaluations done on three popular datasets over two tasks, namely, road extraction and building foot-print detection. Specifically, we obtain 78.19% meanIoU on SpaceNet building footprint dataset, 67.03% and 77.11% on the road topology metric on SpaceNet and DeepGlobe road extraction dataset, respectively.

References

[1]
Nicolas Audebert, Bertrand Le Saux, and Sebastien Lefevre. 2017. Joint Learning From Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[2]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 12 (2017), 2481--2495. https://doi.org/10.1109/TPAMI.2016.2644615 arxiv: 1511.00561
[3]
Favyen Bastani, Songtao He, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, and David DeWitt. 2018. RoadTracer: Automatic Extraction of Road Networks from Aerial Images. (2018). https://doi.org/10.1109/CVPR.2018.00496 arxiv: 1802.03680
[4]
Anil Batra, Suriya Singh, Guan Pang, Saikat Basu, C V Jawahar, and Manohar Paluri. 2019. Improved Road Connectivity by Joint Learning of Orientation and Segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Samuel Rota Bulo, Lorenzo Porzi, and Peter Kontschieder. 2018. In-place Activated BatchNorm for Memory-Optimized Training of DNNs. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 5639--5647. https://doi.org/10.1109/CVPR.2018.00591 arxiv: 1712.02616
[6]
Abhishek Chaurasia and Eugenio Culurciello. 2017. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In IEEE Visual Communications and Image Processing (VCIP). https://doi.org/10.1109/VCIP.2017.8305148 arxiv: 1707.03718
[7]
Liang Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 4 (2018), 834--848. https://doi.org/10.1109/TPAMI.2017.2699184 arxiv: 1606.00915
[8]
Ilke Demir, Forest Hughes, Aman Raj, Kaunil Dhruv, Suryanarayana Murthy Muddala, Sanyam Garg, Barrett Doo, and Ramesh Raskar. 2018a. Generative Street Addresses from Satellite Imagery. ISPRS International Journal of Geo-Information, Vol. 7, 3 (2018). https://doi.org/10.3390/ijgi7030084
[9]
Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raska. 2018b. DeepGlobe 2018: A challenge to parse the earth through satellite images. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). https://doi.org/10.1109/CVPRW.2018.00031 arxiv: 1805.06561
[10]
J Deng, W Dong, R Socher, L.-J. Li, K Li, and L Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
M Everingham, L Van Gool, C K I Williams, J Winn, and A Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, Vol. 88, 2 (2010), 303--338.
[12]
Sergey Golovanov, Rauf Kurbanov, Aleksey Artamonov, Alex Davydow, and Sergey Nikolenko. 2018. Building detection from satellite imagery using a composite loss function. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. https://doi.org/10.1109/CVPRW.2018.00040
[13]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and Harnessing Adversarial Examples. (2014). arxiv: 1412.6572 http://arxiv.org/abs/1412.6572
[14]
Ryuhei Hamaguchi and Shuhei Hikosaka. 2018. Building Detection From Satellite Imagery Using Ensemble of Size-Specific Detectors. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[15]
Kaiming He, Georgia Gkioxari, Piotr Dollá r, and Ross Girshick. 2017. Mask R-CNN. (mar 2017). arxiv: 1703.06870 http://arxiv.org/abs/1703.06870
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2016-Decem. 770--778. https://doi.org/10.1109/CVPR.2016.90 arxiv: 1512.03385
[17]
Songtao He, Favyen Bastani, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Mohamed M. Elshrif, Samuel Madden, and Amin Sadeghi. 2020. Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding. (jul 2020). arxiv: 2007.09547 http://arxiv.org/abs/2007.09547
[18]
Geoffrey E. Hinton, Simon Osindero, and Yee Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation, Vol. 18, 7 (2006), 1527--1554. https://doi.org/10.1162/neco.2006.18.7.1527
[19]
Humanitarian OpenStreetMap Team. [n.d.]. https://export.hotosm.org/en/v3/. https://export.hotosm.org/en/v3/
[20]
Vladimir Iglovikov, Selim Seferbekov, Alexander Buslaev, and Alexey Shvets. 2018. TernausNetV2: Fully convolutional network for instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 228--232. https://doi.org/10.1109/CVPRW.2018.00042 arxiv: 1806.00844
[21]
Tsung-Yi Lin, Piotr Dollá r, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In CVPR .arxiv: 1612.03144 http://arxiv.org/abs/1612.03144
[22]
Tsung Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollá r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision, Vol. 8693 LNCS. 740--755. https://doi.org/10.1007/978--3--319--10602--1_48 arxiv: 1405.0312
[23]
Wei Liu, Andrew Rabinovich, and Alexander C. Berg. 2016. ParseNet: Looking Wider to See Better. ICLR (jun 2016). arxiv: 1506.04579 http://arxiv.org/abs/1506.04579
[24]
Ye Luo, Loong-Fah Cheong, and An Tran. 2015. Actionness-assisted Recognition of Actions. In The IEEE International Conference on Computer Vision (ICCV).
[25]
Gellert Mattyus, Wenjie Luo, and Raquel Urtasun. 2017. DeepRoadMapper: Extracting Road Topology From Aerial Images. In The IEEE International Conference on Computer Vision (ICCV).
[26]
Gellert Mattyus and Raquel Urtasun. 2018. Matching Adversarial Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27]
Gellert Mattyus, Shenlong Wang, Sanja Fidler, and Raquel Urtasun. 2015. Enhancing Road Maps by Parsing Aerial Images Around the World. In International Conference on Computer Vision (ICCV).
[28]
Fausto Milletari, Nassir Navab, and Seyed Ahmad Ahmadi. 2016. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016 (2016), 565--571. https://doi.org/10.1109/3DV.2016.79 arxiv: 1606.04797
[29]
Agata Mosinska, Pablo Marquez-Neila, Mateusz Kozinski, and Pascal Fua. 2018. Beyond the Pixel-Wise Loss for Topology-Aware Delineation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 3136--3145. https://doi.org/10.1109/CVPR.2018.00331 arxiv: 1712.02190
[30]
Sharan Narang, Gregory Diamos, Erich Elsen, Paulius Micikevicius, Jonah Alben, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed precision training. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings .arxiv: 1710.03740 http://arxiv.org/abs/1710.03740
[31]
Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulò, and Peter Kontschieder. 2017. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In International Conference on Computer Vision (ICCV). https://www.mapillary.com/dataset/vistas
[32]
Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. (2016). arxiv: 1606.02147 http://arxiv.org/abs/1606.02147
[33]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. (jun 2015). arxiv: 1506.01497 http://arxiv.org/abs/1506.01497
[34]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 9351 (2015), 234--241. https://doi.org/10.1007/978--3--319--24574--4_28 arxiv: 1505.04597
[35]
Evan Shelhamer, Jonathan Long, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 39. 640--651. https://doi.org/10.1109/TPAMI.2016.2572683 arxiv: 1411.4038
[36]
Suriya Singh, Anil Batra, Guan Pang, Lorenzo Torresani, Saikat Basu, Manohar Paluri, and C. V. Jawahar. 2018. Self-supervised Feature Learning for Semantic Segmentation of Overhead Imagery. In British Machine Vision Conference (BMVC), Vol. 1.
[37]
An Tran and Loong-Fah Cheong. 2017. Two-stream Flow-guided Convolutional Attention Networks for Action Recognition. In The IEEE International Conference on Computer Vision Workshop (ICCVW).
[38]
USGS. [n.d.]. https://earthexplorer.usgs.gov/. https://earthexplorer.usgs.gov/
[39]
Adam Van Etten, Dave Lindenbaum, and Todd M. Bacastow. 2018. SpaceNet: A Remote Sensing Dataset and Challenge Series. (2018). arxiv: 1807.01232 http://arxiv.org/abs/1807.01232
[40]
Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018. Understanding Convolution for Semantic Segmentation. In IEEE Winter Conference on Applications of Computer Vision (WACV). 1451--1460. https://doi.org/10.1109/WACV.2018.00163 arxiv: 1702.08502
[41]
Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, and Raquel Urtasun. 2017. TorontoCity: Seeing the World with a Million Eyes. In Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2017.327 arxiv: 1612.00423
[42]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2017.660 arxiv: 1612.01105v2
[43]
Lichen Zhou, Chuang Zhang, and Ming Wu. 2018. D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vol. 2018-June. https://doi.org/10.1109/CVPRW.2018.00034

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SUMAC'20: Proceedings of the 2nd Workshop on Structuring and Understanding of Multimedia heritAge Contents
October 2020
70 pages
ISBN:9781450381550
DOI:10.1145/3423323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. building footprint
  2. hyperspectral imaging
  3. mapping application
  4. multi-stage training
  5. pp-linknet
  6. remote sensing
  7. road network
  8. transfer learning

Qualifiers

  • Research-article

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 5 of 6 submissions, 83%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media