Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers

Talasila, Abhiroop; Maity, Maitreya; Priyakumar, U. Deva

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.12781 (cs)

[Submitted on 21 May 2024]

Title:Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers

Authors:Abhiroop Talasila, Maitreya Maity, U. Deva Priyakumar

View PDF HTML (experimental)

Abstract:Unsupervised pre-training has emerged as a transformative paradigm, displaying remarkable advancements in various domains. However, the susceptibility to domain shift, where pre-training data distribution differs from fine-tuning, poses a significant obstacle. To address this, we augment the Swin Transformer to learn from different medical imaging modalities, enhancing downstream performance. Our model, dubbed SwinFUSE (Swin Multi-Modal Fusion for UnSupervised Enhancement), offers three key advantages: (i) it learns from both Computed Tomography (CT) and Magnetic Resonance Images (MRI) during pre-training, resulting in complementary feature representations; (ii) a domain-invariance module (DIM) that effectively highlights salient input regions, enhancing adaptability; (iii) exhibits remarkable generalizability, surpassing the confines of tasks it was initially pre-trained on. Our experiments on two publicly available 3D segmentation datasets show a modest 1-2% performance trade-off compared to single-modality models, yet significant out-performance of up to 27% on out-of-distribution modality. This substantial improvement underscores our proposed approach's practical relevance and real-world applicability. Code is available at: this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2405.12781 [cs.CV]
	(or arXiv:2405.12781v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.12781

Submission history

From: Abhiroop Talasila [view email]
[v1] Tue, 21 May 2024 13:28:32 UTC (4,209 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators