Towards an Efficient Remote Sensing Image Compression Network with Visual State Space Model
Abstract
:1. Introduction
- Based on the variational auto-encoder (VAE), we propose a remote sensing image compression network with the visual state space model, replacing traditional CNN and transformer methods to achieve a balance between computational complexity and performance.
- To extract global-spatial features effectively while maintaining linear computational complexity, we introduce the Cross-Selective Scan Block (CSSB) as the fundamental transformation block. The CSSB employs a 2D-bidirectional selective scan strategy to replace the self-attention mechanism.
- To address the challenge of estimating a more accurate entropy model in remote sensing image compression networks, we propose the Omni-Selective Scan mechanism for channel and global context model (CGCM) in our networks, which performs bidirectional scanning from three different directions to model data flow, enabling global-spatial context interaction between different slices.
2. Related Work
2.1. Learning-Based Image Compression
2.2. Learned Remote Sensing Image Compression
2.3. State Space Models
3. Materials and Methods
3.1. Preliminaries
3.2. Overall Framework of the Proposed VMIC
3.3. Cross-Selective Scan Block (CSSB)
3.4. Channel and Global Context Model (CGCM)
4. Experiment and Results
4.1. Setup of Experiments
4.1.1. Training Details
4.1.2. Evaluation Metrics
4.2. Performance
4.2.1. Rate-Distortion Performance
4.2.2. Complexity Analysis
4.3. Qualitative Results
4.4. Ablation Studies
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Shi, C.; Shi, K.; Zhu, F.; Zeng, Z.; Wang, L. A multi-level domain similarity enhancement-guided network for remote sensing image compression. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5645819. [Google Scholar]
- Guo, T.; Luo, F.; Zhang, L.; Tan, X.; Liu, J.; Zhou, X. Target detection in hyperspectral imagery via sparse and dense hybrid representation. IEEE Geosci. Remote Sens. Lett. 2019, 17, 716–720. [Google Scholar] [CrossRef]
- Wallace, G.K. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 1992, 38, xviii–xxxiv. [Google Scholar] [CrossRef]
- Taubman, D.S.; Marcellin, M.W.; Rabbani, M. JPEG2000: Image compression fundamentals, standards and practice. J. Electron. Imaging 2002, 11, 286–287. [Google Scholar] [CrossRef]
- Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.R. Overview of the Versatile Video Coding (VVC) Standard and its Applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
- Ahmed, N.; Natarajan, T.; Rao, K.R. Discrete cosine transform. IEEE Trans. Comput. 1974, 100, 90–93. [Google Scholar] [CrossRef]
- Marpe, D.; Schwarz, H.; Wiegand, T. Context-based adaptive binary arithmetic coding in the H. 264/AVC video compression standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 620–636. [Google Scholar] [CrossRef]
- Zhu, J.; Zhang, J.; Chen, H.; Xie, Y.; Gu, H.; Lian, H. A cross-view intelligent person search method based on multi-feature constraints. Int. J. Digit. Earth 2024, 17, 2346259. [Google Scholar] [CrossRef]
- Xie, Y.; Zhan, N.; Zhu, J.; Xu, B.; Chen, H.; Mao, W.; Luo, X.; Hu, Y. Landslide extraction from aerial imagery considering context association characteristics. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103950. [Google Scholar] [CrossRef]
- Xu, W.; Feng, Z.; Wan, Q.; Xie, Y.; Feng, D.; Zhu, J.; Liu, Y. Building Height Extraction From High-Resolution Single-View Remote Sensing Images Using Shadow and Side Information. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6514–6528. [Google Scholar] [CrossRef]
- Xie, Y.; Liu, S.; Chen, H.; Cao, S.; Zhang, H.; Feng, D.; Wan, Q.; Zhu, J.; Zhu, Q. Localization, balance and affinity: A stronger multifaceted collaborative salient object detector in remote sensing images. IEEE Trans. Geosci. Remote. Sens. 2024, 63, 4700117. [Google Scholar] [CrossRef]
- Wang, Y.; Liang, F.; Liang, J.; Fu, H. S2LIC: Learned Image Compression with the SwinV2 Block, Adaptive Channel-wise and Global-inter Attention Context. arXiv 2024, arXiv:2403.14471. [Google Scholar]
- Liu, J.; Sun, H.; Katto, J. Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14388–14397. [Google Scholar]
- Makarichev, V.; Vasilyeva, I.; Lukin, V.; Vozel, B.; Shelestov, A.; Kussul, N. Discrete atomic transform-based lossy compression of three-channel remote sensing images with quality control. Remote Sens. 2021, 14, 125. [Google Scholar] [CrossRef]
- Li, J.; Fu, Y.; Li, G.; Liu, Z. Remote sensing image compression in visible/near-infrared range using heterogeneous compressive sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4932–4938. [Google Scholar] [CrossRef]
- Ballé, J.; Laparra, V.; Simoncelli, E. End-to-end optimized image compression. arXiv 2016, arXiv:1611.01704. [Google Scholar]
- Lu, M.; Guo, P.; Shi, H.; Cao, C.; Ma, Z. Transformer-based Image Compression. In Proceedings of the 2022 Data Compression Conference (DCC), Snowbird, UT, USA, 22–25 March 2022; p. 469. [Google Scholar] [CrossRef]
- Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.; Johnston, N. Variational image compression with a scale hyperprior. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Fu, H.; Liang, F.; Lin, J.; Li, B.; Akbari, M.; Liang, J.; Zhang, G.; Liu, D.; Tu, C.; Han, J. Learned Image Compression With Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules. IEEE Trans. Image Process. 2023, 32, 2063–2076. [Google Scholar] [CrossRef]
- Li, J.; Liu, Z. Efficient compression algorithm using learning networks for remote sensing images. Appl. Soft Comput. 2021, 100, 106987. [Google Scholar] [CrossRef]
- Chong, Y.; Zhai, L.; Pan, S. High-order Markov random field as attention network for high-resolution remote-sensing image compression. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5401714. [Google Scholar] [CrossRef]
- Fu, C.; Du, B. Remote sensing image compression based on the multiple prior information. Remote Sens. 2023, 15, 2211. [Google Scholar] [CrossRef]
- Xiang, S.; Liang, Q.; Tang, P. Task-Oriented Compression Framework for Remote Sensing Satellite Data Transmission. IEEE Trans. Ind. Inform. 2024, 20, 3487–3496. [Google Scholar] [CrossRef]
- Pan, T.; Zhang, L.; Qu, L.; Liu, Y. A Coupled Compression Generation Network for Remote-Sensing Images at Extremely Low Bitrates. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5608514. [Google Scholar] [CrossRef]
- Han, P.; Zhao, B.; Li, X. Edge-Guided Remote-Sensing Image Compression. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5524515. [Google Scholar] [CrossRef]
- Ye, Y.; Wang, C.; Sun, W.; Chen, Z. Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates. arXiv 2024, arXiv:2409.01935. [Google Scholar]
- Gu, A.; Johnson, I.; Goel, K.; Saab, K.; Dao, T.; Rudra, A.; Ré, C. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Adv. Neural Inf. Process. Syst. 2021, 34, 572–585. [Google Scholar]
- Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. In Proceedings of the The International Conference on Learning Representations (ICLR), Online, 25–29 April 2022. [Google Scholar]
- Fu, D.Y.; Dao, T.; Saab, K.K.; Thomas, A.W.; Rudra, A.; Ré, C. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Liu, Y. VMamba: Visual State Space Model. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 10–15 November 2024. [Google Scholar]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Hu, V.T.; Baumann, S.A.; Gui, M.; Grebenkova, O.; Ma, P.; Fischer, J.; Ommer, B. ZigMa: A DiT-style Zigzag Mamba Diffusion Model. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Guo, H.; Li, J.; Dai, T.; Ouyang, Z.; Ren, X.; Xia, S.T. Mambair: A simple baseline for image restoration with state-space model. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Lu, Y.; Wang, S.; Wang, Z.; Xia, P.; Zhou, T. LFMamba: Light Field Image Super-Resolution with State Space Model. arXiv 2024, arXiv:2406.12463. [Google Scholar]
- Cao, Y.; Liu, C.; Wu, Z.; Yao, W.; Xiong, L.; Chen, J.; Huang, Z. Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion. arXiv 2024, arXiv:2410.05624. [Google Scholar]
- Ma, X.; Zhang, X.; Pun, M.O. RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6011405. [Google Scholar] [CrossRef]
- Zhi, R.; Fan, X.; Shi, J. MambaFormerSR: A Lightweight Model for Remote-Sensing Image Super-Resolution. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6015705. [Google Scholar] [CrossRef]
- Zhao, S.; Chen, H.; Zhang, X.; Xiao, P.; Bai, L.; Ouyang, W. Rs-mamba for large remote sensing image dense prediction. arXiv 2024, arXiv:2404.02668. [Google Scholar] [CrossRef]
- Chen, H.; Song, J.; Han, C.; Xia, J.; Yokoya, N. ChangeMamba: Remote Sensing Change Detection with Spatiotemporal State Space Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4409720. [Google Scholar] [CrossRef]
- Chen, K.; Chen, B.; Liu, C.; Li, W.; Zou, Z.; Shi, Z. RSMamba: Remote Sensing Image Classification With State Space Model. IEEE Geosci. Remote Sens. Lett. 2024, 21, 8002605. [Google Scholar] [CrossRef]
- Minnen, D.; Singh, S. Channel-wise Autoregressive Entropy Models for Learned Image Compression. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020. [Google Scholar]
- He, D.; Yang, Z.; Peng, W.; Ma, R.; Qin, H.; Wang, Y. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5718–5727. [Google Scholar]
- Shi, Y.; Xia, B.; Jin, X.; Wang, X.; Zhao, T.; Xia, X.; Xiao, X.; Yang, W. VmambaIR: Visual State Space Model for Image Restoration. arXiv 2024, arXiv:2403.11423. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
- Bégaint, J.; Racapé, F.; Feltman, S.; Pushparaja, A. Compressai: A pytorch library and evaluation platform for end-to-end compression research. arXiv 2020, arXiv:2011.03029. [Google Scholar]
- Wu, X.; Huang, T.Z.; Deng, L.J.; Zhang, T.J. Dynamic Cross Feature Fusion for Remote Sensing Pansharpening. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Qian, Y.; Lin, M.; Sun, X.; Tan, Z.; Jin, R. Entroformer: A Transformer-based Entropy Model for Learned Image Compression. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- He, Z.; Huang, M.; Luo, L.; Yang, X.; Zhu, C. Towards real-time practical image compression with lightweight attention. Expert Syst. Appl. 2024, 252, 124142. [Google Scholar] [CrossRef]
- Bjontegaard, G. Calculation of average PSNR differences between RD-curves. VCEG-M33 2001. Available online: https://api.semanticscholar.org/CorpusID:61598325 (accessed on 21 January 2025).
Datasets | Methods | BD-Rate (%) | Inference Time (s) | MACs (G) | FLOPs (G) | |
---|---|---|---|---|---|---|
Encode. Time | Decode. Time | |||||
VTM-17.1 [6] | Anchor | − | − | − | − | |
AID | JPEG [3] | 113.41 | − | − | − | − |
BPG [5] | 15.31 | − | − | − | − | |
Ballé [20] | 18.14 | 0.07 | 0.06 | 50.85 | 101.75 | |
EntroFormer [55] | −0.33 | 2.23 | 0.14 | 146.69 | 343.25 | |
TIC’22 [18] | 7.68 | 4.73 | 10.18 | 167.84 | 336.8 | |
ELIC [46] | −2.14 | 0.28 | 0.14 | 154.62 | 309.76 | |
LWLIC’24 [56] | 2.67 | 0.01 | 0.05 | 170.33 | 341.73 | |
S2LIC’24 [13] | −2.79 | 0.30 | 0.39 | 246.44 | 494.31 | |
VMIC (Ours) | −4.48 | 0.27 | 0.32 | 141.14 | 314.06 | |
NWPU VHR-10 | JPEG [3] | 196.14 | − | − | − | − |
BPG [5] | 14.47 | − | − | − | − | |
Ballé [20] | 17.89 | 0.07 | 0.09 | 101.69 | 203.49 | |
EntroFormer [55] | −4.84 | 5.46 | 0.43 | 296.65 | 594.31 | |
TIC’22 [18] | 0.47 | 5.13 | 12.55 | 335.65 | 673.56 | |
ELIC [46] | −7.15 | 0.35 | 0.16 | 309.24 | 619.52 | |
LWLIC’24 [56] | 0.67 | 0.13 | 0.06 | 340.67 | 683.46 | |
S2LIC’24 [13] | −8.21 | 0.44 | 0.46 | 510.74 | ||
VMIC (Ours) | −9.80 | 0.32 | 0.34 | 282.29 | 628.11 | |
WorldView-3 Panchromatic | JPEG [3] | 198.06 | − | − | − | − |
BPG [5] | 22.94 | − | − | − | − | |
Ballé [20] | 22.04 | 0.08 | 0.04 | 38.93 | 77.9 | |
EntroFormer [55] | 0.39 | 3.28 | 0.22 | 113.58 | 259.71 | |
TIC’22 [18] | 4.51 | 3.45 | 8.06 | 135.18 | 271.35 | |
ELIC [46] | −7.01 | 0.34 | 0.14 | 118.38 | 247.16 | |
LWLIC’24 [56] | 1.65 | 0.13 | 0.05 | 137.25 | 288.22 | |
S2LIC’24 [13] | −5.72 | 0.36 | 0.39 | 187.08 | 375.2 | |
VMIC (Ours) | −6.73 | 0.25 | 0.29 | 108.06 | 240.45 | |
WorldView-3 Multispectral | JPEG [3] | 365.57 | − | − | − | − |
BPG [5] | 34.11 | − | − | − | − | |
Ballé [20] | 25.42 | 0.07 | 0.04 | 38.93 | 77.9 | |
EntroFormer [55] | 1.01 | 1.15 | 0.16 | 113.58 | 259.71 | |
TIC’22 [18] | 5.29 | 3.08 | 7.36 | 135.18 | 271.35 | |
ELIC [46] | −7.38 | 0.33 | 0.14 | 118.38 | 247.16 | |
LWLIC’24 [56] | 3.27 | 0.14 | 0.06 | 137.25 | 288.22 | |
S2LIC’24 [13] | −2.93 | 0.31 | 0.35 | 187.08 | 375.2 | |
VMIC (Ours) | −7.93 | 0.24 | 0.27 | 108.06 | 240.45 |
Method | bpp ↓ | PSNR (dB) ↑ | MS-SSIM (dB) ↑ | |
---|---|---|---|---|
Uni-scan | 0.013 | 0.454 | 34.08 | 16.62 |
Bi-scan | 0.013 | 0.451 | 34.14 | 16.65 |
Omni-scan | 0.013 | 0.443 | 34.16 | 16.67 |
Uni-scan | 0.045 | 0.874 | 37.56 | 19.98 |
Bi-scan | 0.045 | 0.869 | 37.64 | 20.07 |
Omni-scan | 0.045 | 0.861 | 37.69 | 20.11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Liang, F.; Wang, S.; Chen, H.; Cao, Q.; Fu, H.; Chen, Z. Towards an Efficient Remote Sensing Image Compression Network with Visual State Space Model. Remote Sens. 2025, 17, 425. https://doi.org/10.3390/rs17030425
Wang Y, Liang F, Wang S, Chen H, Cao Q, Fu H, Chen Z. Towards an Efficient Remote Sensing Image Compression Network with Visual State Space Model. Remote Sensing. 2025; 17(3):425. https://doi.org/10.3390/rs17030425
Chicago/Turabian StyleWang, Yongqiang, Feng Liang, Shang Wang, Hang Chen, Qi Cao, Haisheng Fu, and Zhenjiao Chen. 2025. "Towards an Efficient Remote Sensing Image Compression Network with Visual State Space Model" Remote Sensing 17, no. 3: 425. https://doi.org/10.3390/rs17030425
APA StyleWang, Y., Liang, F., Wang, S., Chen, H., Cao, Q., Fu, H., & Chen, Z. (2025). Towards an Efficient Remote Sensing Image Compression Network with Visual State Space Model. Remote Sensing, 17(3), 425. https://doi.org/10.3390/rs17030425