skip to main content
research-article

Cooperative Separation of Modality Shared-Specific Features for Visible-Infrared Person Re-Identification

Published: 01 January 2024 Publication History

Abstract

Visible-infrared person re-identification (VI-ReID) is a challenging task because the different imaging principles of visible and infrared images bring about huge modality discrepancy. Existing methods primarily address this issue by generating intermediate images to align modality features and establish connections between the visible and infrared modalities. However, the quality of these generated images is often unstable, limiting the effectiveness of such approaches. To overcome this limitation, we propose a novel method called modality shared-specific features cooperative separation. It consists of two key modules: the saliency response module and the cooperative separation module, aimed at alleviating the modality gap. The saliency response module incorporates a location attention mechanism and local features to construct contextual connections and extract local saliency information. Then, the cooperative separation module employs a more concise dual-MLPs as generator to effectively separate shared-specific features. Additionally, we introduce a shared feature refinement mechanism in both the generator and discriminator. By coordinating the shared-specific features, our method achieves secondary separation and extracts purer modality-shared features without specific information. Extensive experiments conducted on the SYSU-MM01 and RegDB public datasets demonstrate that our proposed method performs excellently in VI-ReID.

References

[1]
L. Zheng et al., “Scalable person re-identification: A benchmark,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2015, pp. 1116–1124.
[2]
L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification: Past, present and future,” 2016, arXiv:1610.02984.
[3]
Y. Zhang, Y. Kang, S. Zhao, and J. Shen, “Dual-semantic consistency learning for visible-infrared person re-identification,” IEEE Trans. Inf. Forensics Secur., vol. 18, pp. 1554–1565, 2023.
[4]
Y. Zhu et al., “Hetero-center loss for cross-modality person re-identification,” Neurocomputing, vol. 386, pp. 97–109, 2020.
[5]
M. Ye, X. Lan, J. Li, and P. Yuen, “Hierarchical discriminative learning for visible thermal person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2018, pp. 7501–7508.
[6]
H. Liu, J. Cheng, W. Wang, Y. Su, and H. Bai, “Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification,” Neurocomputing, vol. 398, pp. 11–19, 2020.
[7]
Z. Feng, J. Lai, and X. Xie, “Learning modality-specific representations for visible-infrared person re-identification,” IEEE Trans. Image Process., vol. 29, pp. 579–590, 2020.
[8]
Y. Hao, N. Wang, J. Li, and X. Gao, “HSME: Hypersphere manifold embedding for visible thermal person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 8385–8392.
[9]
Y.-B. Zhao, J.-W. Lin, Q. Xuan, and X. Xi, “HPILN: A feature learning framework for cross-modality person re-identification,” IET Image Process., vol. 13, no. 14, pp. 2897–2904, 2019.
[10]
Z. Wei, X. Yang, N. Wang, and X. Gao, “Flexible body partition-based adversarial learning for visible infrared person re-identification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, pp. 4676–4687, Sep. 2022.
[11]
Y. Zhang, Y. Yan, Y. Lu, and H. Wang, “Towards a unified middle modality learning for visible-infrared person re-identification,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 788–796.
[12]
M. Ye, C. Chen, J. Shen, and L. Shao, “Dynamic TRI-level relation mining with attentive graph for visible infrared re-identification,” IEEE Trans. Inf. Forensics Secur., vol. 17, pp. 386–398, 2022.
[13]
Z. Wang, Z. Wang, Y. Zheng, Y.-Y. Chuang, and S. Satoh, “Learning to reduce dual-level discrepancy for infrared-visible person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 618–626.
[14]
G. Wang et al., “RGB-infrared cross-modality person re-identification via joint pixel and feature alignment,” in Proc. Eur. Conf. Comput. Vis., 2019, pp. 3623–3632.
[15]
G.-A. Wang et al., “Cross-modality paired-images generation for RGB-infrared person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12144–12151.
[16]
S. Choi, S. Lee, Y. Kim, T. Kim, and C. Kim, “Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10257–10266.
[17]
Y. Peng and J. Qi, “CM-GANs: Cross-modal generative adversarial networks for common representation learning,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 15, no. 1, pp. 1–24, 2019.
[18]
A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “RGB-infrared cross-modality person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5380–5389.
[19]
M. Ye, Z. Wang, X. Lan, and P. C. Yuen, “Visible thermal person re-identification via dual-constrained top-ranking,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 1092–1099.
[20]
M. Ye, X. Lan, Q. Leng, and J. Shen, “Cross-modality person re-identification via modality-aware collaborative ensemble learning,” IEEE Trans. Image Process., vol. 29, pp. 9387–9399, 2020.
[21]
P. Wang et al., “Deep multi-patch matching network for visible thermal person re-identification,” IEEE Trans. Multimedia, vol. 23, pp. 1474–1488, 2021.
[22]
H. Liu, X. Tan, and X. Zhou, “Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification,” IEEE Trans. Multimedia, vol. 23, pp. 4414–4425, 2021.
[23]
D. Xie, C. Deng, C. Li, X. Liu, and D. Tao, “Multi-task consistency-preserving adversarial hashing for cross-modal retrieval,” IEEE Trans. Image Process., vol. 29, pp. 3626–3637, 2020.
[24]
B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen, “Adversarial cross-modal retrieval,” in Proc. 25th ACM Int. Conf. Multimedia, 2017, pp. 154–162.
[25]
P. Dai, R. Ji, H. Wang, Q. Wu, and Y. Huang, “Cross-modality person re-identification with generative adversarial training,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 677–683.
[26]
C. Fang et al., “Separating noisy samples from tail classes for long-tailed image classification with label noise,” IEEE Trans. Neural Netw. Learn. Syst., early access, Jul. 12, 2023.
[27]
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.
[28]
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7132–7141.
[29]
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19.
[30]
J. Shen, X. Tang, X. Dong, and L. Shao, “Visual object tracking by hierarchical attention siamese network,” IEEE Trans. Cybern., vol. 50, no. 7, pp. 3068–3080, Jul. 2020.
[31]
J. Yin, J. Shen, X. Gao, D. J. Crandall, and R. Yang, “Graph neural network and spatiotemporal transformer attention for 3D video object detection from point clouds,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 9822–9835, Aug. 2023.
[32]
D. Cheng et al., “Hybrid routing transformer for zero-shot learning,” Pattern Recognit., vol. 137, 2023, Art. no.
[33]
D. Wu, M. Ye, G. Lin, X. Gao, and J. Shen, “Person re-identification by context-aware part attention and multi-head collaborative learning,” IEEE Trans. Inf. Forensics Secur., vol. 17, pp. 115–126, 2022.
[34]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[35]
I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020.
[36]
Y. Hao, J. Li, N. Wang, and X. Gao, “Modality adversarial neural network for visible-thermal person re-identification,” Pattern Recognit., vol. 107, 2020, Art. no.
[37]
Y. Lu et al., “Cross-modality person re-identification with shared-specific feature transfer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13379–13389.
[38]
Y. Hao, N. Wang, X. Gao, J. Li, and X. Wang, “Dual-alignment feature embedding for cross-modality person re-identification,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 57–65.
[39]
D. Li, X. Wei, X. Hong, and Y. Gong, “Infrared-visible cross-modal person re-identification with an x modality,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 4610–4617.
[40]
W. Hu, B. Liu, H. Zeng, Y. Hou, and H. Hu, “Adversarial decoupling and modality-invariant representation learning for visible-infrared person re-identification,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5095–5109, Aug. 2022.
[41]
Y. Ling et al., “Class-aware modality mix and center-guided metric learning for visible-thermal person re-identification,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 889–897.
[42]
X. Wei, D. Li, X. Hong, W. Ke, and Y. Gong, “Co-attentive lifting for infrared-visible person re-identification,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 1028–1037.
[43]
N. Pu, W. Chen, Y. Liu, E. M. Bakker, and M. S. Lew, “Dual Gaussian-based variational subspace disentanglement for visible-infrared person re-identification,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 2149–2158.
[44]
M. Ye, J. Shen, and L. Shao, “Visible-infrared person re-identification via homogeneous augmented tri-modal learning,” IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 728–739, 2021.
[45]
H. Park, S. Lee, J. Lee, and B. Ham, “Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 12046–12055.
[46]
Y. Chen, L. Wan, Z. Li, Q. Jing, and Z. Sun, “Neural feature search for RGB-infrared person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 587–597.
[47]
Z. Zhao, B. Liu, Q. Chu, Y. Lu, and N. Yu, “Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 3520–3528.
[48]
J. Zhao et al., “Spatial-channel enhanced transformer for visible-infrared person re-identification,” IEEE Trans. Multimedia, vol. 25, pp. 3668–3680, 2023.
[49]
X. Tian et al., “Farewell to mutual information: Variational distillation for cross-modal person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1522–1531.
[50]
C. Fu et al., “CM-NAS: Cross-modality neural architecture search for visible-infrared person re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 11823–11832.
[51]
X. Hao, S. Zhao, M. Ye, and J. Shen, “Cross-modality person re-identification via modality confusion and center aggregation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16403–16412.
[52]
J. Liu, J. Wang, N. Huang, Q. Zhang, and J. Han, “Revisiting modality-specific feature compensation for visible-infrared person re-identification,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 10, pp. 7226–7240, Oct. 2022.
[53]
Z. Huang, J. Liu, L. Li, K. Zheng, and Z.-J. Zha, “Modality-adaptive mixup and invariant decomposition for RGB-infrared person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 1034–1042.
[54]
Q. Zhang, C. Lai, J. Liu, N. Huang, and J. Han, “FMCNet: Feature-level modality compensation for visible-infrared person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 7349–7358.
[55]
L. Wan et al., “G$^{2}$DA: Geometry-guided dual-alignment learning for RGB-infrared person re-identification,” Pattern Recognit., vol. 135, 2023, Art. no.
[56]
Z. Wei, X. Yang, N. Wang, and X. Gao, “Syncretic modality collaborative learning for visible infrared person re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 225–234.
[57]
D. T. Nguyen, H. G. Hong, K. W. Kim, and K. R. Park, “Person recognition system based on a combination of body images from visible light and thermal cameras,” Sensors, vol. 17, no. 3, 2017, Art. no.
[58]
J. Yao et al., “Position-based anchor optimization for point supervised dense nuclei detection,” Neural Netw., vol. 171, pp. 159–170, 2024.
[59]
H. Sun et al., “Not all pixels are matched: Dense contrastive learning for cross-modality person re-identification,” in Proc. 30th ACM Int. Conf. Multimedia, 2022, pp. 5333–5341.
[60]
M. Yang et al., “Learning with twin noisy labels for visible-infrared person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 14308–14317.
[61]
D. Zhang et al., “Dual mutual learning for cross-modality person re-identification,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5361–5373, Aug. 2022.

Cited By

View all

Index Terms

  1. Cooperative Separation of Modality Shared-Specific Features for Visible-Infrared Person Re-Identification
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image IEEE Transactions on Multimedia
            IEEE Transactions on Multimedia  Volume 26, Issue
            2024
            11427 pages

            Publisher

            IEEE Press

            Publication History

            Published: 01 January 2024

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 09 Feb 2025

            Other Metrics

            Citations

            Cited By

            View all

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media