skip to main content
10.1145/3503161.3547902acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

DetFusion: A Detection-driven Infrared and Visible Image Fusion Network

Published: 10 October 2022 Publication History

Abstract

Infrared and visible image fusion aims to utilize the complementary information between the two modalities to synthesize a new image containing richer information. Most existing works have focused on how to better fuse the pixel-level details from both modalities in terms of contrast and texture, yet ignoring the fact that the significance of image fusion is to better serve downstream tasks. For object detection tasks, object-related information in images is often more valuable than focusing on the pixel-level details of images alone. To fill this gap, we propose a detection-driven infrared and visible image fusion network, termed DetFusion, which utilizes object-related information learned in the object detection networks to guide multimodal image fusion. We cascade the image fusion network with the detection networks of both modalities and use the detection loss of the fused images to provide guidance on task-related information for the optimization of the image fusion network. Considering that the object locations provide a priori information for image fusion, we propose an object-aware content loss that motivates the fusion model to better learn the pixel-level information in infrared and visible images. Moreover, we design a shared attention module to motivate the fusion network to learn object-specific information from the object detection networks. Extensive experiments show that our DetFusion outperforms state-of-the-art methods in maintaining pixel intensity distribution and preserving texture details. More notably, the performance comparison with state-of-the-art image fusion methods in task-driven evaluation also demonstrates the superiority of the proposed method. Our code will be available: https://github.com/SunYM2020/DetFusion.

Supplementary Material

MP4 File (MM22-fp0620.mp4)
Presentation video

References

[1]
V Aslantas and Emre Bendes. 2015. A new image quality metric for image fusion: the sum of the correlations of differences. Aeu-international Journal of electronics and communications, Vol. 69, 12 (2015), 1890--1896.
[2]
Durga Prasad Bavirisetti and Ravindra Dhuli. 2016. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Physics & Technology, Vol. 76 (2016), 52--64.
[3]
Durga Prasad Bavirisetti, Gang Xiao, Junhao Zhao, Ravindra Dhuli, and Gang Liu. 2019. Multi-scale guided image and video fusion: A fast and efficient approach. Circuits, Systems, and Signal Processing, Vol. 38, 12 (2019), 5576--5605.
[4]
Yang Bin, Yang Chao, and Huang Guoyu. 2016. Efficient image fusion with approximate sparse representation. International Journal of Wavelets, Multiresolution and Information Processing, Vol. 14, 04 (2016), 1650024.
[5]
Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International journal of computer vision, Vol. 111, 1 (2015), 98--136.
[6]
Zhizhong Fu, Xue Wang, Jin Xu, Ning Zhou, and Yufei Zhao. 2016. Infrared and visible images fusion based on RPCA and NSCT. Infrared Physics & Technology, Vol. 77 (2016), 114--123.
[7]
Mengxi Guo, Mingtao Chen, Cong Ma, Yuan Li, Xianfeng Li, and Xiaodong Xie. 2020. High-level task-driven single image deraining: Segmentation in rainy days. In International Conference on Neural Information Processing. Springer, 350--362.
[8]
Muhammad Haris, Greg Shakhnarovich, and Norimichi Ukita. 2021. Task-driven super resolution: Object detection in low-resolution images. In International Conference on Neural Information Processing. Springer, 387--395.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[10]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[11]
Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. 2021. LLVIP: A Visible-infrared Paired Dataset for Low-light Vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. 3496--3504.
[12]
Jung Uk Kim, Sungjune Park, and Yong Man Ro. 2021. Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection. IEEE Transactions on Circuits and Systems for Video Technology (2021), 1--1.
[13]
BK Shreyamsha Kumar. 2013. Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform. Signal, Image and Video Processing, Vol. 7, 6 (2013), 1125--1143.
[14]
Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. 2017. AOD-Net: All-in-One Dehazing Network. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 4780--4788.
[15]
Chengyang Li, Dan Song, Ruofeng Tong, and Min Tang. 2019. Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognition, Vol. 85 (2019), 161--171.
[16]
Hui Li and Xiao-Jun Wu. 2018. DenseFuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, Vol. 28, 5 (2018), 2614--2623.
[17]
Hui Li, Xiao-Jun Wu, and Tariq Durrani. 2020a. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Transactions on Instrumentation and Measurement, Vol. 69, 12 (2020), 9645--9656.
[18]
Hui Li, Xiao-Jun Wu, and Josef Kittler. 2018. Infrared and visible image fusion using a deep learning framework. In 2018 24th international conference on pattern recognition (ICPR). IEEE, 2705--2710.
[19]
Hui Li, Xiao-Jun Wu, and Josef Kittler. 2020b. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Transactions on Image Processing, Vol. 29 (2020), 4733--4746.
[20]
Hui Li, Xiao-Jun Wu, and Josef Kittler. 2021. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, Vol. 73 (2021), 72--86.
[21]
Shutao Li, Xudong Kang, and Jianwen Hu. 2013. Image fusion with guided filtering. IEEE Transactions on Image processing, Vol. 22, 7 (2013), 2864--2875.
[22]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.
[23]
Ding Liu, Bihan Wen, Xianming Liu, Zhangyang Wang, and Thomas S Huang. 2018b. When image denoising meets high-level vision tasks: a deep learning approach. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. ijcai.org, 842--848.
[24]
Tianshan Liu, Kin-Man Lam, Rui Zhao, and Guoping Qiu. 2021. Deep Cross-modal Representation Learning and Distillation for Illumination-invariant Pedestrian Detection. IEEE Transactions on Circuits and Systems for Video Technology (2021), 1--1.
[25]
Yu Liu, Xun Chen, Zengfu Wang, Z Jane Wang, Rabab K Ward, and Xuesong Wang. 2018a. Deep learning for pixel-level image fusion: Recent advances and future prospects. Information Fusion, Vol. 42 (2018), 158--173.
[26]
Yu Liu, Xun Chen, Rabab K Ward, and Z Jane Wang. 2016. Image fusion with convolutional sparse representation. IEEE signal processing letters, Vol. 23, 12 (2016), 1882--1886.
[27]
Anna Llordés, Guillermo Garcia, Jaume Gazquez, and Delia J Milliron. 2013. Tunable near-infrared and visible-light transmittance in nanocrystal-in-glass composites. Nature, Vol. 500, 7462 (2013), 323--326.
[28]
Jiayi Ma, Chen Chen, Chang Li, and Jun Huang. 2016. Infrared and visible image fusion via gradient transfer and total variation minimization. Information Fusion, Vol. 31 (2016), 100--109.
[29]
Jiayi Ma, Han Xu, Junjun Jiang, Xiaoguang Mei, and Xiao-Ping Zhang. 2020. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, Vol. 29 (2020), 4980--4995.
[30]
Jiayi Ma, Wei Yu, Pengwei Liang, Chang Li, and Junjun Jiang. 2019. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information Fusion, Vol. 48 (2019), 11--26.
[31]
Jinlei Ma, Zhiqiang Zhou, Bo Wang, and Hua Zong. 2017. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Physics & Technology, Vol. 82 (2017), 8--17.
[32]
Yuxin Peng, Jinwei Qi, Xin Huang, and Yuxin Yuan. 2017. CCL: Cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Transactions on Multimedia, Vol. 20, 2 (2017), 405--420.
[33]
K Ram Prabhakar, V Sai Srikar, and R Venkatesh Babu. 2017. Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE international conference on computer vision. 4714--4722.
[34]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015), 91--99.
[35]
Yiming Sun, Bing Cao, Pengfei Zhu, and Qinghua Hu. 2022. Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. IEEE Transactions on Circuits and Systems for Video Technology (2022), 1--1.
[36]
Linfeng Tang, Jiteng Yuan, and Jiayi Ma. 2022. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion, Vol. 82 (2022), 28--42.
[37]
FA Team et al. 2019. Free flir thermal dataset for algorithm training. https://www.flir.com/oem/adas/adas-dataset-form/
[38]
Bin Wang, Tao Lu, and Yanduo Zhang. 2020b. Feature-driven super-resolution for object detection. In 2020 5th International Conference on Control, Robotics and Cybernetics (CRC). IEEE, 211--215.
[39]
Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, and Junzhou Huang. 2020a. Deep multimodal fusion by channel exchanging. Advances in Neural Information Processing Systems, Vol. 33 (2020), 4835--4845.
[40]
Zhijun Wang, Djemel Ziou, Costas Armenakis, Deren Li, and Qingquan Li. 2005. A comparative analysis of image fusion methods. IEEE transactions on geoscience and remote sensing, Vol. 43, 6 (2005), 1391--1402.
[41]
Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. 2022. U2Fusion: A Unified Unsupervised Image Fusion Network. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 1 (2022), 502--518.
[42]
Heng Zhang, Elisa Fromont, Sébastien Lefevre, and Bruno Avignon. 2020a. Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 276--280.
[43]
Hao Zhang, Han Xu, Yang Xiao, Xiaojie Guo, and Jiayi Ma. 2020c. Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In Proceedings of the AAAI Conference on Artificial Intelligence. 12797--12804.
[44]
Qiang Zhang, Yi Liu, Rick S Blum, Jungong Han, and Dacheng Tao. 2018. Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review. Information Fusion, Vol. 40 (2018), 57--75.
[45]
Yun Zhang. 2004. Understanding image fusion. Photogramm. Eng. Remote Sens, Vol. 70, 6 (2004), 657--661.
[46]
Yu Zhang, Yu Liu, Peng Sun, Han Yan, Xiaolin Zhao, and Li Zhang. 2020b. IFCNN: A general image fusion framework based on convolutional neural network. Information Fusion, Vol. 54 (2020), 99--118.
[47]
Zixiang Zhao, Shuang Xu, Chunxia Zhang, Junmin Liu, Jiangshe Zhang, and Pengfei Li. 2020. DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. ijcai.org, 970--976.
[48]
Zixiang Zhao, Shuang Xu, Jiangshe Zhang, Chengyang Liang, Chunxia Zhang, and Junmin Liu. 2021. Efficient and Model-Based Infrared and Visible Image Fusion via Algorithm Unrolling. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 3 (2021), 1186--1196.
[49]
Huabing Zhou, Wei Wu, Yanduo Zhang, Jiayi Ma, and Haibin Ling. 2021. Semantic-supervised Infrared and Visible Image Fusion via a Dual-discriminator Generative Adversarial Network. IEEE Transactions on Multimedia (2021), 1--1.
[50]
Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019).

Cited By

View all

Index Terms

  1. DetFusion: A Detection-driven Infrared and Visible Image Fusion Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. detection-driven
    2. image fusion
    3. object-aware content loss
    4. shared attention mechanism

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)829
    • Downloads (Last 6 weeks)100
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media