research-article

Confidence-based stopping criteria for active learning for data annotation

Authors:

Matthew MaAuthors Info & Claims

ACM Transactions on Speech and Language Processing (TSLP), Volume 6, Issue 3

Article No.: 3, Pages 1 - 24

https://doi.org/10.1145/1753783.1753784

Published: 29 April 2010 Publication History

Abstract

The labor-intensive task of labeling data is a serious bottleneck for many supervised learning approaches for natural language processing applications. Active learning aims to reduce the human labeling cost for supervised learning methods. Determining when to stop the active learning process is a very important practical issue in real-world applications. This article addresses the stopping criterion issue of active learning, and presents four simple stopping criteria based on confidence estimation over the unlabeled data pool, including maximum uncertainty, overall uncertainty, selected accuracy, and minimum expected error methods. Further, to obtain a proper threshold for a stopping criterion in a specific task, this article presents a strategy by considering the label change factor to dynamically update the predefined threshold of a stopping criterion during the active learning process. To empirically analyze the effectiveness of each stopping criterion for active learning, we design several comparison experiments on seven real-world datasets for three representative natural language processing applications such as word sense disambiguation, text classification and opinion analysis.

References

[1]

Angliun, D. 1988. Queries and concept learning. Mach. Learn. 2, 3, 319--342.

Digital Library

[2]

Baram, Y., El-Yaniv, R., and Luz, K. 2004. Online choice of active learning algorithms. J. Mach. Learn. Res. 5, 255--291.

Digital Library

[3]

Becker, M. and Osborne, M. 2005. A two-stage method for active learning of statistical grammars. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 991--996.

Digital Library

[4]

Berger, A. L., Della Pietra, S. A., and Della Pietra, V. J. 1996. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 1, 39--71.

Digital Library

[5]

Bruce, R. and Wiebe, J. 1994. Word-Sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 139--146.

Digital Library

[6]

Campbell, C., Cristianini, N., and Smola, A. 2000. Query learning with large margin classifiers. In Proceedings of the International Conference on Machine Learning. 111--118.

Digital Library

[7]

Chan, Y. S. and Ng, H. T. 2007. Domain adaptation with active learning for word sense disambiguation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 49--56.

[8]

Chen, J., Schein, A., Ungar, L., and Palmer, M. 2006. An empirical study of the behavior of active learning for word sense disambiguation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 120--127.

Digital Library

[9]

Cohn, D. A., Atlas, L., and Ladner, R. E. 1994. Improving generalization with active learning. Mach. Learn. 15, 201--221.

Digital Library

[10]

Cohn, D. A., Ghahramani, Z., and Jordan, M. I. 1996. Active learning with statistical models. J. Artif. Intell. Res. 4, 129--145.

[11]

Culotta, A. and McCallum, A. 2005. Reducing labeling effort for structured prediction tasks. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05). 746--751.

Digital Library

[12]

Dimistakakis, C. and Savu-Krohn, C. 2008. Cost-Minimizing strategies for data labeling: Optimal stopping and active learning. In Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS). 96--111.

Digital Library

[13]

Donmez P., Carbonell, J. G., and Bennett, P. N. 2007. Dual strategy active learning. In Proceedings of the European Conference on Machine Learning (ECML). 1--12.

Digital Library

[14]

Duda, R. O. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. Wiley, New York.

[15]

Dagan, I. and Engelson, S. P. 1995. Committee-Based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning. 150--157.

[16]

Freund, Y., Seung, H. S., Shamir, E., and Tishby, N. 1997. Selective sampling using the query by committee algorithm. Mach. Learn. 28, 2, 133--168.

Digital Library

[17]

Hovy, E. H., Marcus, M., Palmer, M., Ramshaw, L., and Weischedel, R. 2006. Ontonotes: The 90&percnt; solution. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 57--60.

Digital Library

[18]

Hwa, R. 2000. Sample selection for statistical grammar induction. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. 45--52.

Digital Library

[19]

Korner, C. and Wrobel, S. 2006. Multi-Class ensemble-based active learning. In Proceedings of the European Conference on Machine Learning (ECML). 687--694.

Digital Library

[20]

Kim, S. M. 2006. Identification, classification, and analysis of opinions on the Web. Ph.D. thesis, University of Southern California.

Digital Library

[21]

Jones, R. 2005. Learning to extract entities from labeled and unlabeled text. Ph.D. thesis, Carnegie Mellon University.

[22]

Laws, F. and Schütze, H. 2008. Stopping criteria for active learning of named entity recognition. In Proceedings of the 22nd International Conference on Computational Linguistics. 465--472.

Digital Library

[23]

Lee, Y. and Ng, H. 2002. An empirical evaluation of knowledge sources and learning algorithm for word sense disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 41--48.

Digital Library

[24]

Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of 17th ACM International Conference on Research and Development in Information Retrieval. 3--12.

Digital Library

[25]

Li, M. and Sethi, I. K. 2006. Confidence-Based active learning. IEEE Trans. Patt. Anal. Mach. Intell. 28, 8, 1251--1261.

Digital Library

[26]

Marcus, M., Santorini, B., and Marcinkiewicz, M. 1993. Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 2, 313--330.

[27]

McCallum, A. and Nigram, K. 1998a. Employing EM in pool-based active learning for text classification. In Proceedings of 15th International Conference on Machine Learning. 350--358.

Digital Library

[28]

McCallum, A. and Nigram, K. 1998b. A comparison of event models for naïve Bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization.

[29]

Ng, H. and Lee, H. 1996. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics. 40--47.

Digital Library

[30]

Ngai, G. and Yarowsky, D. 2000. Rule writing or annotation: Cost-Efficient resource usage for based noun phrase chunking. In Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics. 117--125.

Digital Library

[31]

Philpot, A., Hovy, E. H., and Pantel, P. 2005. The Omega ontology. In Proceedings of OntoLex Conference onOntologies and Lexical Resources. 59--66.

[32]

Platt, J. 1999. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Classifiers. 61--74.

[33]

Roth, D. and Small, K. 2008. Active learning for pipeline models. In Proceedings of the National Conference on Artificial Intelligence. 683--688.

Digital Library

[34]

Roy, N. and McCallum, A. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning. 441--448.

Digital Library

[35]

Schein, A. and Ungar, L. 2007. Active learning for logistic regression: An evaluation. Mach. Learn. 68, 235--265.

Digital Library

[36]

Schohn, G. and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning. 839--846.

Digital Library

[37]

Seung, H. S., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the 5th Annual ACM Conference on Computational Learning Theory. 287--294.

Digital Library

[38]

Shen, D., Zhang, J., Su, J., Zhou, G., and Tan, C. 2004. Multi-Criteria-Based active learning for named entity recognition. In Proceedings of the 42th Annual Meeting of the Association of Computational Linguistics.

Digital Library

[39]

Tang, M., Luo, X., and Roukos, S. 2002. Active learning for statistical natural language parsing. In Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics. 120--127.

Digital Library

[40]

Tong, S. and Koller, D. 2002. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45--66.

Digital Library

[41]

Tomanek, K., Wermter, J., and Hahn, U. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 486--495.

[42]

Thompson, C. A., Califf, M. E., and Mooney, R. J. 1999. Active learning for natural language parsing and information extraction. In Proceedings of the 16th International Conference on Machine Learning. 406--414.

Digital Library

[43]

Vlachos, A. 2008. A stopping criterion for active learning. Comput. Speech Lang. 22, 3, 295--312.

Digital Library

[44]

Wiebe, J., Breck, E., Buckley, C., Cardie, C., Davis, P., et al. 2003. Recognizing and organizing opinions expressed in the world press. In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering.

[45]

Zhu, J. and Hovy, E. H. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 783--790.

[46]

Zhu, J., Wang, H., and Hovy, E. H. 2008a. Learning a stopping criterion for active learning for word sense disambiguation and text classification. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 366--372.

[47]

Zhu, J., Wang, H., and Hovy, E. H. 2008b. Multi-Criteria-Based strategy to stop active learning for data annotation. In Proceedings of the 22nd International Conference on Computational Linguistics. 1129--1136.

Digital Library

Cited By

van Grinsven MBrinkhuis MKrempl GSnijder J(2025)Efficient and General Text Classification: An Active Learning Approach Using Active Learning and NLP to Aid Processes Such as Journalistic Investigations And document AnalysisMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_8(105-120)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74627-7_8
Delmas VJacquemin DBlondel AVacher MLaurent A(2024)How to actively learn chemical reaction yields in real-time using stopping criteriaReaction Chemistry & Engineering10.1039/D3RE00628J9:5(1206-1215)Online publication date: 2024
https://doi.org/10.1039/D3RE00628J
Miseta TFodor AVathy-Fogarassy Á(2024)Surpassing early stoppingNeurocomputing10.1016/j.neucom.2023.127028567:COnline publication date: 28-Jan-2024
https://dl.acm.org/doi/10.1016/j.neucom.2023.127028
Show More Cited By

Index Terms

Confidence-based stopping criteria for active learning for data annotation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Stopping Criterion for Active Learning with Model Stability
Regular Papers

Active learning selectively labels the most informative instances, aiming to reduce the cost of data annotation. While much effort has been devoted to active sampling functions, relatively limited attention has been paid to when the learning process ...
Active Learning from Positive and Unlabeled Data
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops

During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label ...
Active Learning for kNN Using Instance Impact
AI 2022: Advances in Artificial Intelligence
Abstract
Labelling unlabeled data is a time-consuming and expensive process. Labelling initiatives should select samples that are likely to enhance the classification accuracy of the classifier. Several methods can be employed to accomplish this goal. One ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Speech and Language Processing

ACM Transactions on Speech and Language Processing Volume 6, Issue 3

April 2010

24 pages

ISSN:1550-4875

EISSN:1550-4883

DOI:10.1145/1753783

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2010

Accepted: 01 February 2010

Revised: 01 May 2009

Received: 01 July 2008

Published in TSLP Volume 6, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
814
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

van Grinsven MBrinkhuis MKrempl GSnijder J(2025)Efficient and General Text Classification: An Active Learning Approach Using Active Learning and NLP to Aid Processes Such as Journalistic Investigations And document AnalysisMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_8(105-120)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74627-7_8
Delmas VJacquemin DBlondel AVacher MLaurent A(2024)How to actively learn chemical reaction yields in real-time using stopping criteriaReaction Chemistry & Engineering10.1039/D3RE00628J9:5(1206-1215)Online publication date: 2024
https://doi.org/10.1039/D3RE00628J
Miseta TFodor AVathy-Fogarassy Á(2024)Surpassing early stoppingNeurocomputing10.1016/j.neucom.2023.127028567:COnline publication date: 28-Jan-2024
https://dl.acm.org/doi/10.1016/j.neucom.2023.127028
Sobot TStankovic VStankovic L(2024)Human in the loop active learning for time-series electrical measurement dataEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108589133(108589)Online publication date: Jul-2024
https://doi.org/10.1016/j.engappai.2024.108589
Ming XLi SLi MHe LWang Q(2024)AutoLabel: Automated Textual Data Annotation Method Based on Active Learning and Large Language ModelKnowledge Science, Engineering and Management10.1007/978-981-97-5501-1_30(400-411)Online publication date: 16-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-5501-1_30
Tharwat ASchenck W(2023)A Survey on Active Learning: State-of-the-Art, Practical Challenges and Research DirectionsMathematics10.3390/math1104082011:4(820)Online publication date: 6-Feb-2023
https://doi.org/10.3390/math11040820
Zhou TCai ZLiu FSu J(2023)In Pursuit of Beauty: Aesthetic-Aware and Context-Adaptive Photo Selection in CrowdsensingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323796935:9(9364-9377)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TKDE.2023.3237969
Santosh KNakarmi SSantosh KNakarmi S(2023)Active Learning—ValidationActive Learning to Minimize the Possible Risk of Future Epidemics10.1007/978-981-99-7442-9_5(45-54)Online publication date: 23-Nov-2023
https://doi.org/10.1007/978-981-99-7442-9_5
Xu YKhare AMatlin GRamadoss MKamaleswaran RZhang CTumanov AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)UnfoldMLProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600602(4598-4611)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600602
Albert-Weiss DOsman A(2022)Interactive Deep Learning for Shelf Life Prediction of Muskmelons Based on an Active Learning ApproachSensors10.3390/s2202041422:2(414)Online publication date: 6-Jan-2022
https://doi.org/10.3390/s22020414
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents