research-article

DSMCA: Deep Supervised Model with the Channel Attention Module for Cross-modal Retrieval

Authors:

Chenwei LinAuthors Info & Claims

DSDE '22: Proceedings of the 2022 5th International Conference on Data Storage and Data Engineering

Pages 72 - 78

https://doi.org/10.1145/3528114.3528126

Published: 24 June 2022 Publication History

Abstract

Cross-modal retrieval has become a highlighted research topic, to provide flexible retrieval experience across multimedia data. It is challenging to retrieve information on massive multimodal data due to the heterogeneity and semantic gap of multimodal data. In this paper, we propose a novel cross-modal retrieval method, called Deep Supervised Model with the Channel Attention module (DSMCA). It aims to efficiently learn a common representation of heterogeneous data while maintaining semantic discriminability and modality invariance. Specifically, to improve the representation capability of the network, the squeeze and excitation block is implemented to explicitly establish inter-channel correlations and adaptively adjust the weights of the feature channels. A weight-sharing strategy is used between two branches of the network to reduce cross-modal heterogeneity. Furthermore, our model combines the complementary losses in different levels for improving cross-modal retrieval performance. The extensive experiments demonstrate the effectiveness of our method in the application of cross-modal retrieval.

References

[1]

Peng Y, Xin H, Zhao Y. An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9):2372-2385.

Digital Library

[2]

Gong X, Huang L, Wang F. Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval[C]// 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 2018.

[3]

Hardoon D R, Szedmak S, Shawe-Taylor J. Canonical Correlation Analysis: An Overview with Application to Learning Methods[J]. Neural Computation, 2004, 16(12):2639-2664.

Digital Library

[4]

Gong Y, Ke Q, Isard M, A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics[J]. 2012.

[5]

Ranjan V, Rasiwasia N, Jawahar C V. Multi-Label Cross-modal Retrieval[C]// International Conference on Computer Vision (ICCV 2015). IEEE Computer Society, 2015.

[6]

Wang W, Livescu K. Large-Scale Approximate Kernel Canonical Correlation Analysis[J]. Computer Science, 2015.

[7]

Jiang Q Y, Li W J. Deep Cross-Modal Hashing[J]. IEEE Computer Society, 2017:3270-3278. [2] K. Wang, Q. Yin, W. Wang, S. Wu, and L. Wang. A comprehensive survey on cross-modal retrieval, 2016.

[8]

Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553):436.

[9]

Andrew G, Arora R, Bilmes J, Deep Canonical Correlation Analysis[C]// International Conference on International Conference on Machine Learning. JMLR.org, 2013.

[10]

Cao Y, Long M, Wang J, Deep Visual-Semantic Hashing for Cross-Modal Retrieval[C]// the 22nd ACM SIGKDD International Conference. ACM, 2016.

[11]

Zhu L, Tian G, Wang B, Multi-attention based semantic deep hashing for cross-modal retrieval[J]. Applied Intelligence, 2021(12):1-13.

[12]

Zhen L, Hu P, Wang X, Deep Supervised Cross-Modal Retrieval[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.

[13]

Wang C, Yang H, Meinel C. Deep Semantic Mapping for Cross-Modal Retrieval[C]// IEEE International Conference on Tools with Artificial Intelligence. IEEE, 2015.

[14]

Rasiwasia N, Pereira J C, Coviello E, A New Approach to Cross-Modal Multimedia Retrieval[C]// Proceedings of the 18th International Conference on Multimedea 2010, Firenze, Italy, October 25-29, 2010. ACM, 2010.

[15]

Peng Y, Huang X, Qi J. Cross-media shared representation by hierarchical learning with multiple deep networks. 2016.

[16]

Wang B, Yang Y, Xu X, Adversarial Cross-Modal Retrieval[C]// the 2017 ACM. ACM, 2017.

[17]

Peng Y, Qi J, Huang X, CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network[J]. IEEE Transactions on Multimedia, 2017, 20(2):405-420.

Digital Library

[18]

Feng F, Wang X, Li R. Cross-modal Retrieval with Correspondence Autoencoder. ACM, 2014.

Digital Library

[19]

Wang W, Arora R, Livescu K, On Deep Multi-View Representation Learning: Objectives and Optimization[J]. 2016.

[20]

Sharma, Kumar, Daume, Generalized Multiview Analysis: A discriminative latent space[C]// IEEE. IEEE, 2012.

[21]

Nair V, Hinton G E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair[C]// International Conference on International Conference on Machine Learning. Omnipress, 2010.

[22]

Wu J, Lin Z, Zha H. Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval[C]// International Acm Sigir Conference. ACM, 2017:917-920.

[23]

Wang W, Yang X, Ooi B C, Effective deep learning-based multi-modal retrieval[J]. The VLDB Journal, 2016.

Digital Library

[24]

Wei Y, Yao Z, Lu C, Cross-Modal Retrieval With CNN Visual Features: A New Baseline[J]. IEEE Transactions on Cybernetics, 2017, 47(2):449-460.

[25]

Vaswani A, Shazeer N, Parmar N, Attention Is All You Need[J]. arXiv, 2017.

[26]

Fe I W, Jiang M, Chen Q, Residual Attention Network for Image Classification[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.

[27]

Jie H, Li S, Gang S. Squeeze-and-Excitation Networks[J]. IEEE, 2018.

[28]

Huang P, Kang G, Liu W, Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment. ACM 2019.

[29]

Krizhevsky A, Sutskever I, Hinton G. ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in neural information processing systems, 2012, 25(2).

[30]

Kim Y. Convolutional Neural Networks for Sentence Classification[J]. Eprint Arxiv, 2014.

[31]

Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. Computer Science, 2014.

[32]

Wang L, Wu J, Huang S L, An Efficient Approach to Informative Feature Extraction from Multimodal Data[J]. 2018.

[33]

Pereira J C, Coviello E, Doyle G, On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval[J]. IEEE Trans Pattern Anal Mach Intell, 2014, 36(3):521-35.

Digital Library

[34]

Rashtchian C, Young P, Hodosh M, Collecting image annotations using Amazon's Mechanical Turk[C]// Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. 2010.

[35]

Liu W, Mu C, Kumar S, Discrete graph hashing[J]. Advances in Neural Information Processing Systems, 2014, 4:3419-3427.

[36]

Hotelling H. Relations Between Two Sets of Variates[J]. Biometrika, 1935, 28:321-377.

[37]

Rupnik J, Shawe-Taylor J. Multi-View Canonical Correlation Analysis[J]. Taylor, 2010.

[38]

Zhai X, Peng Y, Xiao J. Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(6):1-1.

[39]

Nair V, Hinton G E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair[C]// International Conference on International Conference on Machine Learning. Omnipress, 2010.

[40]

Liang J, He R, Sun Z, Group-Invariant Cross-Modal Subspace Learning[C]// Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, 2016.

[41]

Hu P, Zhen L, Peng D, Scalable Deep Multimodal Learning for Cross-Modal Retrieval[C]// ACM SIGIR. ACM, 2019.

[42]

Zhang Q, Lei Z, Zhang Z, Context-Aware Attention Network for Image-Text Retrieval[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.

Recommendations

Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity
Abstract
Cross-modal retrieval aims to search the semantically similar instances from the other modalities given a query from one modality. However, the differences of the distributions and representations between different modalities make that the ...
HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-...
Representation separation adversarial networks for cross-modal retrieval
Abstract
Cross-modal retrieval aims to search the semantically similar instances from the other modalities by giving a query from one modality. Recently, generative adversarial networks (GANs) has been proposed to model the joint distribution over the data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

DSDE '22: Proceedings of the 2022 5th International Conference on Data Storage and Data Engineering

February 2022

124 pages

ISBN:9781450395724

DOI:10.1145/3528114

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

DSDE 2022

DSDE 2022: 2022 the 5th International Conference on Data Storage and Data Engineering

February 25 - 27, 2022

Sanya, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
51
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents