×
Oct 10, 2022 · We propose FineCo (Fine-grained Contrastive Loss for Frame Sampling), an approach to better learn video and language representations with a fine-grained ...
It helps distil a video by selecting the frames that are semantically equivalent to the text, improving cross-modal correspondence. Building on the well ...
FineCo (Fine-grained Contrastive Loss for Frame Sampling), an approach to better learn video and language representations with a fine-graining contrastive ...
Oct 10, 2022 · We propose FineCo (Fine-grained. Contrastive Loss for Frame Sampling), an ap- proach to better learn video and language rep- resentations with a ...
Coautores ; Contrastive Video-Language Learning with Fine-grained Frame Sampling. Z Wang, Y Zhong, Y Miao, L Ma, L Specia. AACL-IJCNLP 2022, 2022. 10, 2022.
from publication: Contrastive Video-Language Learning with Fine-grained Frame Sampling | Despite recent progress in video and language representation learning,
Soavtorji ; Contrastive Video-Language Learning with Fine-grained Frame Sampling. Z Wang, Y Zhong, Y Miao, L Ma, L Specia. AACL-IJCNLP 2022, 2022. 10, 2022.
In this paper, we teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit ...
VideoCLIP is presented, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream ...