RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

Du, Jiawei; Guo, Jia; Zhang, Weihang; Yang, Shengzhu; Liu, Hanruo; Li, Huiqi; Wang, Ningli

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.14137 (cs)

[Submitted on 23 May 2024 (v1), last revised 19 Aug 2024 (this version, v2)]

Title:RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

Authors:Jiawei Du, Jia Guo, Weihang Zhang, Shengzhu Yang, Hanruo Liu, Huiqi Li, Ningli Wang

View PDF HTML (experimental)

Abstract:The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs (CFPs), employing a tripartite optimization strategy to focus on left eye, right eye, and patient level to reflect real-world clinical scenarios. Extensive experiments demonstrate that RET-CLIP outperforms existing benchmarks across eight diverse datasets spanning four critical diagnostic categories: diabetic retinopathy, glaucoma, multiple disease diagnosis, and multi-label classification of multiple diseases, which demonstrate the performance and generality of our foundation model. The sourse code and pre-trained model are available at this https URL.

Comments:	Accepted by MICCAI 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.14137 [cs.CV]
	(or arXiv:2405.14137v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.14137

Submission history

From: Jiawei Du [view email]
[v1] Thu, 23 May 2024 03:20:51 UTC (2,343 KB)
[v2] Mon, 19 Aug 2024 12:40:53 UTC (639 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators