Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

Shi, Bowen; Zhang, Xiaopeng; Xu, Haohang; Dai, Wenrui; Zou, Junni; Xiong, Hongkai; Tian, Qi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2106.04121 (cs)

[Submitted on 8 Jun 2021]

Title:Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

Authors:Bowen Shi, Xiaopeng Zhang, Haohang Xu, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian

View PDF

Abstract:Collecting annotated data for semantic segmentation is time-consuming and hard to scale up. In this paper, we for the first time propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets. The highlight is that the annotations from different domains can be efficiently reused and consistently boost performance for each specific domain. This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets regardless of their taxonomy labels, and followed by fine-tuning the pretrained model over specific dataset as usual. In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing and propose a pixel-to-class sparse coding strategy that explicitly models the pixel-class similarity over the manifold embedding space. In this way, we are able to increase intra-class compactness and inter-class separability, as well as considering inter-class similarity across different datasets for better transferability. Experiments conducted on several benchmarks demonstrate its superior performance. Notably, MDP consistently outperforms the pretrained models over ImageNet by a considerable margin, while only using less than 10% samples for pretraining.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2106.04121 [cs.CV]
	(or arXiv:2106.04121v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2106.04121

Submission history

From: Bowen Shi [view email]
[v1] Tue, 8 Jun 2021 06:13:11 UTC (1,239 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators