research-article

Doubly robust self-training

AUTHORs:

Philip Jacobson,

Michael I. Jordan,

Jiantao JiaoAuthors Info & Claims

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Article No.: 1795, Pages 41413 - 41431

Published: 10 December 2023 Publication History

Abstract

Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provably balances between two extremes. When the pseudo-labels are entirely incorrect, our method reduces to a training process solely using labeled data. Conversely, when the pseudo-labels are completely accurate, our method transforms into a training process utilizing all pseudo-labeled data and labeled data, thus increasing the effective sample size. Through empirical evaluations on both the ImageNet dataset for image classification and the nuScenes autonomous driving dataset for 3D object detection, we demonstrate the superiority of the doubly robust loss over the standard self-training baseline.

References

[1]

A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, and T. Zrnic. Prediction-powered inference. arXiv preprint arXiv:2301.09633, 2023.

[2]

E. Arazo, D. Ortego, P. Albert, N. E. O'Connor, and K. McGuinness. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.

[3]

D. Azriel, L. D. Brown, M. Sklar, R. Berk, A. Buja, and L. Zhao. Semi-supervised linear regression. Journal of the American Statistical Association, 117(540):2238-2251, 2022.

[4]

H. Bang and J. M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962-973, 2005.

[5]

D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785, 2019a.

[6]

D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel. Mixmatch: A holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249, 2019b.

[7]

J. M. Bibby, K. V. Mardia, and J. T. Kent. Multivariate Analysis. Academic Press, 1979.

[8]

T. Birhanu, G. Molenberghs, C. Sotto, and M. G. Kenward. Doubly robust and multiple-imputation-based generalized estimating equations. Journal of Biopharmaceutical Statistics, 21(2):202-225, 2011.

[9]

H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom. nuscenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.

[10]

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters, 2018a.

[11]

V. Chernozhukov, W. Newey, and R. Singh. De-biased machine learning of global and local parameters using regularized Riesz representers. arXiv preprint arXiv:1802.08667, 2018b.

[12]

M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366-3385, 2021.

[13]

M. Ding, B. Xiao, N. Codella, P. Luo, J. Wang, and L. Yuan. Davit: Dual attention vision transformers. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXIV, pages 74-92. Springer, 2022.

Digital Library

[14]

P. Ding and F. Li. Causal inference. Statistical Science, 33(2):214-237, 2018.

[15]

D. J. Foster and V. Syrgkanis. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019.

[16]

J. Gou, B. Yu, S. J. Maybank, and D. Tao. Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789-1819, 2021.

Digital Library

[17]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.

[18]

P. W. Holland. Statistics and causal inference. Journal of the American Statistical Association, 81 (396):945-960, 1986.

[19]

J. Jeong, S. Lee, J. Kim, and N. Kwak. Consistency-based semi-supervised learning for object detection. In Advances in Neural Information Processing Systems, 2019.

[20]

C. M. Jiang, M. Najibi, C. R. Qi, Y. Zhou, and D. Anguelov. Improving the intra-class long-tail in 3d detection via rare example mining. In Computer Vision-ECCV 2022: 17th European Conference, pages 158-175. Springer, 2022.

Digital Library

[21]

J. D. Kang and J. L. Schafer. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22:523-539, 2007.

[22]

A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom. Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12689-12697, 2019.

[23]

D.-H. Lee. Pseudo-label : The simple and efficient semi-supervised learning method for deep neural networks. ICML 2013 Workshop : Challenges in Representation Learning (WREPL), 07 2013.

[24]

J. Li, Z. Liu, J. Hou, and D. Liang. Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023.

[25]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. In ICCV, pages 2980-2988, 2017.

[26]

C. Liu, C. Gao, F. Liu, P. Li, D. Meng, and X. Gao. Hierarchical supervision and shuffle data augmentation for 3d semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.

[27]

Y.-C. Liu, C.-Y. Ma, Z. He, C.-W. Kuo, K. Chen, P. Zhang, B. Wu, Z. Kira, and P. Vajda. Unbiased teacher for semi-supervised object detection. In Proceedings of the International Conference on Learning Representations (ICLR), 2021a.

[28]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021b.

[29]

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.

[30]

A. W. Marshall and I. Olkin. Multivariate chebyshev inequalities. The Annals of Mathematical Statistics, pages 1001-1014, 1960.

[31]

S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345-1359, 2010.

Digital Library

[32]

J. Park, C. Xu, Y. Zhou, M. Tomizuka, and W. Zhan. Detmatch: Two teachers are better than one for joint 2D and 3D semi-supervised object detection. Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 2022.

[33]

P. R. Qi, Y. Zhou, M. Najibi, P. Sun, K. Vo, B. Deng, and D. Anguelov. Offboard 3D object detection from point cloud sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.

[34]

P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41-55, 1983.

[35]

D. B. Rubin. Inference and missing data. Biometrika, 63(3):581-592, 1976.

[36]

D. B. Rubin. Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association, 74(366a):318-328, 1979.

[37]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115:211-252, 2015.

Digital Library

[38]

D. O. Scharfstein, A. Rotnitzky, and J. M. Robins. Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448): 1096-1120, 1999.

[39]

V. Semenova, M. Goldman, V. Chernozhukov, and M. Taddy. Estimation and inference on heterogeneous treatment effects in high-dimensional dynamic panels. arXiv preprint arXiv:1712.09988, 2017.

[40]

L. N. Smith and N. Topin. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, page 1100612. International Society for Optics and Photonics, 2019.

[41]

K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A. Kurakin, H. Zhang, and C. Raffel. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685, 2020a.

[42]

K. Sohn, Z. Zhang, C.-L. Li, H. Zhang, C.-Y. Lee, and T. Pfister. A simple semi-supervised learning framework for object detection. In arXiv:2005.04757, 2020b.

[43]

Y. Tang, W. Chen, Y. Luo, and Y. Zhang. Humble teachers teach better students for semi-supervised object detection. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3131-3140, 2021.

[44]

H. Wang, Y. Cong, O. Litany, Y. Gao, and L. J. Guibas. 3dioumatch: Leveraging iou prediction for semi-supervised 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14615-14624, 2021.

[45]

K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of Big Data, 3 (1):1-40, 2016.

[46]

Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10687-10698, 2020.

[47]

T. Yin, X. Zhou, and P. Krähenbühl. Center-based 3D object detection and tracking. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.

[48]

A. Zhang, L. D. Brown, and T. T. Cai. Semi-supervised inference: General theory and estimation of means. Annals of Statistics, 47:2538-2566, 2019.

[49]

H. Zhou, Z. Ge, S. Liu, W. Mao, Z. Li, H. Yu, and J. Sun. Dense teacher: Dense pseudo-labels for semi-supervised object detection. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, editors, Computer Vision - ECCV 2022, pages 35-50, 2022.

Digital Library

[50]

B. Zhu, Z. Jiang, X. Zhou, Z. Li, and G. Yu. Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492, 2019.

[51]

X. J. Zhu. Semi-supervised learning literature survey. 2005.

Recommendations

Self-Training with Selection-by-Rejection
ICDM '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled ...
Self-paced multi-label co-training
Abstract
Multi-label learning aims to solve classification problems where instances are associated with a set of labels. In reality, it is generally easy to acquire unlabeled data but expensive or time-consuming to label them, and this ...
Self-paced co-training
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

Co-training is a well-known semi-supervised learning approach which trains classifiers on two different views and exchanges labels of unlabeled instances in an iterative way. During co-training process, labels of unlabeled instances in the training pool ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

December 2023

80772 pages

Copyright © 2023 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 December 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten