ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

Upadhyay, Uddeshya; Karthik, Shyamgopal; Mancini, Massimiliano; Akata, Zeynep

Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.00398 (cs)

[Submitted on 1 Jul 2023 (v1), last revised 28 Sep 2023 (this version, v3)]

Title:ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

Authors:Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

View PDF

Abstract:Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inherent ambiguity in the embedding space. We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. On four challenging datasets, i.e., COCO, Flickr, CUB, and Oxford-flowers, we estimate the multi-modal embedding uncertainties for two VLMs, i.e., CLIP and BLIP, quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. Furthermore, we propose active learning and model selection as two real-world downstream tasks for VLMs and show that the estimated uncertainty aids both tasks. Lastly, we present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model. Code is available at this https URL.

Comments:	ICCV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2307.00398 [cs.CV]
	(or arXiv:2307.00398v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.00398

Submission history

From: Uddeshya Upadhyay [view email]
[v1] Sat, 1 Jul 2023 18:16:06 UTC (15,352 KB)
[v2] Tue, 12 Sep 2023 15:46:23 UTC (16,940 KB)
[v3] Thu, 28 Sep 2023 21:13:17 UTC (17,176 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators