Contrastive Self-supervised Representation Learning Using Synthetic Data

She, Dong-Yu; Xu, Kun

doi:10.1007/s11633-021-1297-9

Contrastive Self-supervised Representation Learning Using Synthetic Data

Research Article
Open access
Published: 11 May 2021

Volume 18, pages 556–567, (2021)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Automation and Computing Aims and scope Submit manuscript

Contrastive Self-supervised Representation Learning Using Synthetic Data

Download PDF

990 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Learning discriminative representations with deep neural networks often relies on massive labeled data, which is expensive and difficult to obtain in many real scenarios. As an alternative, self-supervised learning that leverages input itself as supervision is strongly preferred for its soaring performance on visual representation learning. This paper introduces a contrastive self-supervised framework for learning generalizable representations on the synthetic data that can be obtained easily with complete controllability. Specifically, we propose to optimize a contrastive learning task and a physical property prediction task simultaneously. Given the synthetic scene, the first task aims to maximize agreement between a pair of synthetic images generated by our proposed view sampling module, while the second task aims to predict three physical property maps, i.e., depth, instance contour maps, and surface normal maps. In addition, a feature-level domain adaptation technique with adversarial training is applied to reduce the domain difference between the realistic and the synthetic data. Experiments demonstrate that our proposed method achieves state-of-the-art performance on several visual recognition datasets.

Article PDF

A Tool for Building Multi-purpose and Multi-pose Synthetic Data Sets

Inferring 3D Shapes from Image Collections Using Adversarial Networks

Article 24 June 2020

Multi-view and multi-augmentation for self-supervised visual representation learning

Article 16 December 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

B. Zhao, J. S. Feng, X. Wu, S. Yan. A survey on deep learning-based fine-grained object classification and semantic segmentation. International Journal of Automation and Computing, vol. 14, no. 2, pp. 119–135, 2017. DOI: https://doi.org/10.1007/s11633-017-1053-3.
Article Google Scholar
V. K. Ha, J. C. Ren, X. Y. Xu, S. Zhao, G. Xie, V. Masero, A. Hussain. Deep learning based single image super-resolution: A survey. International Journal of Automation and Computing, vol. 16, no. 4, pp. 413–426, 2019. DOI: https://doi.org/10.1007/s11633-019-1183-x.
Article Google Scholar
K. Aukkapinyo, S. Sawangwong, P. Pooyoi, W. Kusakunniran. Localization and classification of rice-grain images using region proposals-based convolutional neural network. International Journal of Automation and Computing, vol. 17, no. 2, pp. 233–246, 2020. DOI: https://doi.org/10.1007/s11633-019-1207-6.
Article Google Scholar
X. L. Wang, A. Gupta. Unsupervised learning of visual representations using videos. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 2794–2802, 2015. DOI: https://doi.org/10.1109/ICCV.2015.320.
Google Scholar
C. Doersch, A. Gupta, A. A. Efros. Unsupervised visual representation learning by context prediction. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1422–1430, 2015. DOI: https://doi.org/10.1109/ICCV.2015.167.
Google Scholar
C. Doersch, A. Zisserman. Multi-task self-supervised visual learning. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 2070–2079, 2017. DOI: https://doi.org/10.1109/ICCV.2017.226.
Google Scholar
S. Gidaris, P. Singh, N. Komodakis. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018.
D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, A. A. Efros. Context encoders: Feature learning by inpainting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 2536–2544, 2016. DOI: https://doi.org/10.1109/CVPR.2016.278.
Google Scholar
G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, vol. 313, no. 5786, pp. 504–507, 2006. DOI: https://doi.org/10.1126/science.1127647.
Article MathSciNet MATH Google Scholar
P. Vincent, H. Larochelle, Y. Bengio, P. A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine learning, ACM, Helsinki, Finland, pp. 1096–1103, 2008. DOI: https://doi.org/10.1145/1390156.1390294.
Chapter Google Scholar
R. Lopez, J. Regier, M. I. Jordan, N. Yosef. Information constraints on auto-encoding variational bayes. In Advances in Neural Information Processing, Montreal, Canada, pp. 6117–6128, 2018.
X. Liu, F. J. Zhang, Z. Y. Hou, Z. Y. Wang, L. Mian, J. Zhang, J. Tang. Seff-supervssed learning: Generative or contrastive. [Online], Available: https://arxiv.org/abs/2006.08218, 2020.
Z. Z. Ren, Y. Jae Lee. Cross-domain self-supervised multitask feature learning using synthetic imagery. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, UT, USA, pp. 762–771, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00086.
Google Scholar
R. Zhang, P. Isola, A. A. Efros. Colorful image colorization. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 649–666, 2016. DOI: https://doi.org/10.1007/978-3-319-46487-9_40.
Google Scholar
R. Hadsell, S. Chopra, Y. LeCun. Dimensionality reduction by learning an invariant mapping. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern, IEEE, New York, USA, pp. 1735–1742, 2006. DOI: https://doi.org/10.1109/CVPR.2006.100.
Google Scholar
A. van den Oord, Y. Z. Li, O. Vinyals. Representation learning with contrastive predictive coding. [Online], Available: https://arxiv.org/abs/1807.03748, 2018.
R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, Y. Bengio. Learning deep representations by mutual information estimation and maximization. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
N. Saunshi, O. Plevrakis, V. Arora, M. Khodak, H. Khandeparkar. A theoretical analysis of contrastive unsupervised representation learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA, pp. 5628–5637, 2019.
T. Nathan Mundhenk, D. Ho, B. Y. Chen. Improvements to context based self-supervised learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 9339–9348, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00973.
Google Scholar
M. Noroozi, P. Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 69–84, 2016. DOI: https://doi.org/10.1007/978-3-319-46466-4_5.
Google Scholar
H. Y. Lee, J. B. Huang, M. Singh, M. H. Yang. Unsupervised representation learning by sorting sequences. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 667–676, 2017. DOI: https://doi.org/10.1109/ICCV.2017.79.
Google Scholar
D. Kim, D. Cho, D. Yoo, I. S. Kweon. Learning image representations by completing damaged jigsaw puzzles. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, IEEE, Lake Tahoe, USA, pp. 793–802, 2018. DOI: https://doi.org/10.1109/WACV.2018.00092.
Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, ACM, Lake Tahoe, USA, pp. 3111–3119, 2013.
Google Scholar
X. H. Zhan, X. G Pan, Z. W. Liu, D. H. Lin, C. C. Loy. Self-supervised learning via conditional motion propagation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 1881–1889, 2019 DOI: https://doi.org/10.1109/CVPR.2019.00198
Z. Y. Feng, C. Xu, D. C. Tao. Self-supervised representation learning by rotation feature decoupling. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 10364–10374, 2019. DOI: https://doi.org/10.1109/CVPR.2019.01061.
Google Scholar
X. L. Wang, K. M. He, A. Gupta. Transitive invariance for self-supervised visual representation learning. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 1338–1347, 2017. DOI: https://doi.org/10.1109/ICCV.2017.149.
Google Scholar
L. H. Zhang, G J. Qi, L. Q. Wang, J. B. Luo. AET vs. AED: Unsupervised representation learning by auto-encoding transformations rather than data. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 2542–2550, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00265.
Google Scholar
J. Donahue, K. Simonyan. Large scale adversarial representation learning. In Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 10541–10551, 2019.
R. Zhang, P. Isola, A. A. Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 645–654, 2017. DOI: https://doi.org/10.1109/CVPR.2017.76.
Google Scholar
X. C. Peng, B. C. Sun, K. Ali, K. Saenko. Learning deep object detectors from 3D models. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1278–1286, 2015. DOI 10.1109/ICCV.2015.151.
Google Scholar
O. J. Hénaff, A. Srinivas, J. De Fauw, A. Razavi, C. Doersch, S. M. A. Eslami, A. van den Oord. Data-efficient image recognition with contrastive predictive coding. [Online], Available: https://arxiv.org/abs/1905.09272, 2019.
P. Bachman, R. D. Hjelm, W. Buchwalter. Learning representations by maximizing mutual information across views. In Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 15509–15519, 2019.
M. Tschannen, J. Djolonga, P. K. Rubenstein, S. Gelly, M. Lucic. On mutual information maximization for representation learning. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
K. M. He, H. Q. Fan, Y. X. Wu, S. N. Xie, R. Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 9726–9735, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00975.
Google Scholar
T. Chen, S. Kornblith, M. Norouzi, G. Hinton. A simple framework for contrastive learning of visual representations. [Online], Available: https://arxiv.org/abs/2002.05709, 2020.
Y. L. Tian, D. Krishnan, P. Isola. Contrastive Multiview coding. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 776–794, 2020. DOI: https://doi.org/10.1007/978-3-030-58621-8_45.
Google Scholar
T. Chen, Y. Z. Sun, Y. Shi, L. J. Hong. On sampling strategies for neural network-based collaborative filtering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Halifax, Canada, pp. 767–776, 2017. DOI: https://doi.org/10.1145/3097983.3098202.
Chapter Google Scholar
J. McCormac, A. Handa, S. Leutenegger, A. J. Davison. SceneNet RGB-D: Can 5M synthetic images beat generic imagenet pre-training on indoor segmentation? In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 2697–2706, 2017. DOI: https://doi.org/10.1109/ICCV.2017.292.
Google Scholar
T. Hachisuka, H. W. Jensen. Parallel progressive photon mapping on GPUS. In ACM SIGGRAPH ASIA, Seoul, Proceedings of Korea, pp. 54:1, 2010.
S. N. Xie, Z. W. Tu. Holistically-nested edge detection. International Journal of Computer Vision, vol. 125, no. 1–3, pp. 3–18, 2017. DOI: https://doi.org/10.1007/s11263-017-1004-z.
Article MathSciNet Google Scholar
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, ACM, Montreal, Canada, pp. 2672–2680, 2014.
Google Scholar
Y. Ganin, V. S. Lempitsky. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 1180–1189, 2015.
K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, D. Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 3722–3731, 2017. DOI: https://doi.org/10.1109/CVPR.2017.18.
Google Scholar
E. Tzeng, J. Hoffman, K. Saenko, T. Darrell. Adversarial discriminative domain adaptation. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 7167–7176, 2017. DOI: https://doi.org/10.1109/CVPR.2017.316.
Google Scholar
K. Sohn, W. L. Shang, X. Yu, M. Chandraker. Unsupervised domain adaptation for distance metric learning. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, ACM, Lake Tahoe, USA, pp. 1097–1105, 2012.
Google Scholar
B. L. Zhou, A. Lapedriza, A. Khosla, A. Oliva, A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 6, pp. 1452–1464, 2018. DOI: https://doi.org/10.1109/TPAMI.2017.2723009.
Article Google Scholar
M. Noroozi, A. Vinjimoor, P. Favaro, H. Pirsiavash. Boosting self-supervised learning via knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 9359–9367, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00975.
Google Scholar
P. Krähenbühl, C. Doersch, J. Donahue, T. Darrell. Data-dependent initializations of convolutional neural networks. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.
M. Noroozi, H. Pirsiavash, P. Favaro. Representation learning by learning to count. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 5899–5907, 2017. DOI: https://doi.org/10.1109/ICCV.2017.628.
Google Scholar
B. Zhou, À. Lapedriza, J. X. Xiao, A. Torralba, A. Oliva. Learning deep features for scene recognition using places database. In Proceedings of Conference in Neural Information Processing Systems, Montreal, Canada, pp. 487–495, 2014.
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, 2015. DOI: https://doi.org/10.1007/s11263-014-0733-5.
Article Google Scholar
R. Girshick. Fast R-CNN. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1440–1448, 2015. DOI: https://doi.org/10.1109/ICCV.2015.169.
Google Scholar
J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3431–3440, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298965.
Google Scholar
N. Silberman, D. Hoiem, P. Kohli, R. Fergus. Indoor segmentation and support inference from RGBD images. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 746–760, 2012. DOI: https://doi.org/10.1007/978-3-642-33715-4_54.
Google Scholar
L. Ladicky, B. Zeisl, M. Pollefeys. Discriminatively trained dense surface normal estimation. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 468–484, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_31.
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61822204 and 61521002).

Author information

Authors and Affiliations

Beijing National Research Center for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Dong-Yu She & Kun Xu

Authors

Dong-Yu She
View author publications
You can also search for this author in PubMed Google Scholar
Kun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Xu.

Additional information

Recommended by Associate Editor Jangmyung Lee

Colored figures are available in the online version at https://link.springer.com/journal/11633

Dong-Yu She received the B. Eng. and the M. Eng. degrees in computer science and technology from Nankai University, China in 2019 and 2016, respectively. She is a Ph. D. degree candidate in Department of Computer Science and Technology, Tsinghua University, China.

Her research interests include deep learning and computer vision. E-mail: [email protected]

ORCID iD: 0000-0002-1434-562X

Kun Xu received B. Eng. and Ph.D. degrees in computer science and technology from Tsinghua University, China in 2005 and 2009, respectively. He is an associate professor in Department of Computer Science and Technology, Tsinghua University, China.

His research interests include realistic rendering and image/video editing.

E-mail: [email protected] (Corresponding author)

ORCID iD: 0000-0002-2671-4170

Rights and permissions

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

She, DY., Xu, K. Contrastive Self-supervised Representation Learning Using Synthetic Data. Int. J. Autom. Comput. 18, 556–567 (2021). https://doi.org/10.1007/s11633-021-1297-9

Download citation

Received: 15 October 2020
Accepted: 29 March 2021
Published: 11 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11633-021-1297-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Contrastive Self-supervised Representation Learning Using Synthetic Data

Abstract

Article PDF

Similar content being viewed by others

A Tool for Building Multi-purpose and Multi-pose Synthetic Data Sets

Inferring 3D Shapes from Image Collections Using Adversarial Networks

Multi-view and multi-augmentation for self-supervised visual representation learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Contrastive Self-supervised Representation Learning Using Synthetic Data

Abstract

Article PDF

Similar content being viewed by others

A Tool for Building Multi-purpose and Multi-pose Synthetic Data Sets

Inferring 3D Shapes from Image Collections Using Adversarial Networks

Multi-view and multi-augmentation for self-supervised visual representation learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation