An empirical study of excitation and aggregation design adaptions in CLIP4Clip for video–text retrieval
References
Index Terms
- An empirical study of excitation and aggregation design adaptions in CLIP4Clip for video–text retrieval
Recommendations
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalConstructing a joint representation invariant across different modalities (e.g., video, language) is of significant importance in many multimedia applications. While there are a number of recent successes in developing effective image-text retrieval ...
Multi-Feature Graph Attention Network for Cross-Modal Video-Text Retrieval
ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalCross-modal retrieval between videos and texts has attracted growing attention due to the rapid growth of user-generated videos on the web. To solve this problem, most approaches try to learn a joint embedding space to measure the cross-modal ...
FeatInter: Exploring fine-grained object features for video-text retrieval
AbstractIn this paper, we target the challenging task of video-text retrieval. The common way for this task is to learn a text-video joint embedding space by cross-modal representation learning, and compute the cross-modality similarity in the ...
Comments
Information & Contributors
Information
Published In
Publisher
Elsevier Science Publishers B. V.
Netherlands
Publication History
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0