×
Sep 9, 2022 · We present a pre-training approach for vision and language transformer models, which is based on a mixture of diverse tasks.
We present a pre-training approach for vision and language transformer models, which is based on a mixture of diverse tasks. We explore both the use of ...
This work explores the use of image-text captioning data in pre-training, which does not need additional supervision, as well as object-aware strategies to ...
We present RO-ViT, a simple recipe to pretrain vision transformers in a region-aware manner for open-vocabulary object detection. Standard pretraining typically ...
Jul 19, 2024 · We present a novel region-centric pretraining approach for open-vocabulary detection by integrating detector heads on top of the image backbone ...
Nov 22, 2023 · In this work, the authors proposed a simple yet effective method for learning fine-grained visual-language representations for open-vocabulary ...
This survey presents the first detailed survey on open vocabulary tasks, including open-vocabulary object detection, open-vocabulary segmentation, and 3D/video ...
Oct 23, 2023 · Image-Text pretraining on web-scale image caption dataset has become the default recipe for open vocabulary classification and retrieval models ...
People also ask
Aug 28, 2023 · We present RO-ViT, a contrastive image-text pre-training framework to bridge the gap between image-level pre-training and open-vocabulary detection fine-tuning.
We present a new open-vocabulary detection approach based on detection- oriented image-text pretraining to bridge the gap between image-level pretraining.