skip to main content
research-article

Confidence-based stopping criteria for active learning for data annotation

Published: 29 April 2010 Publication History

Abstract

The labor-intensive task of labeling data is a serious bottleneck for many supervised learning approaches for natural language processing applications. Active learning aims to reduce the human labeling cost for supervised learning methods. Determining when to stop the active learning process is a very important practical issue in real-world applications. This article addresses the stopping criterion issue of active learning, and presents four simple stopping criteria based on confidence estimation over the unlabeled data pool, including maximum uncertainty, overall uncertainty, selected accuracy, and minimum expected error methods. Further, to obtain a proper threshold for a stopping criterion in a specific task, this article presents a strategy by considering the label change factor to dynamically update the predefined threshold of a stopping criterion during the active learning process. To empirically analyze the effectiveness of each stopping criterion for active learning, we design several comparison experiments on seven real-world datasets for three representative natural language processing applications such as word sense disambiguation, text classification and opinion analysis.

References

[1]
Angliun, D. 1988. Queries and concept learning. Mach. Learn. 2, 3, 319--342.
[2]
Baram, Y., El-Yaniv, R., and Luz, K. 2004. Online choice of active learning algorithms. J. Mach. Learn. Res. 5, 255--291.
[3]
Becker, M. and Osborne, M. 2005. A two-stage method for active learning of statistical grammars. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 991--996.
[4]
Berger, A. L., Della Pietra, S. A., and Della Pietra, V. J. 1996. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 1, 39--71.
[5]
Bruce, R. and Wiebe, J. 1994. Word-Sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 139--146.
[6]
Campbell, C., Cristianini, N., and Smola, A. 2000. Query learning with large margin classifiers. In Proceedings of the International Conference on Machine Learning. 111--118.
[7]
Chan, Y. S. and Ng, H. T. 2007. Domain adaptation with active learning for word sense disambiguation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 49--56.
[8]
Chen, J., Schein, A., Ungar, L., and Palmer, M. 2006. An empirical study of the behavior of active learning for word sense disambiguation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 120--127.
[9]
Cohn, D. A., Atlas, L., and Ladner, R. E. 1994. Improving generalization with active learning. Mach. Learn. 15, 201--221.
[10]
Cohn, D. A., Ghahramani, Z., and Jordan, M. I. 1996. Active learning with statistical models. J. Artif. Intell. Res. 4, 129--145.
[11]
Culotta, A. and McCallum, A. 2005. Reducing labeling effort for structured prediction tasks. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05). 746--751.
[12]
Dimistakakis, C. and Savu-Krohn, C. 2008. Cost-Minimizing strategies for data labeling: Optimal stopping and active learning. In Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS). 96--111.
[13]
Donmez P., Carbonell, J. G., and Bennett, P. N. 2007. Dual strategy active learning. In Proceedings of the European Conference on Machine Learning (ECML). 1--12.
[14]
Duda, R. O. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
[15]
Dagan, I. and Engelson, S. P. 1995. Committee-Based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning. 150--157.
[16]
Freund, Y., Seung, H. S., Shamir, E., and Tishby, N. 1997. Selective sampling using the query by committee algorithm. Mach. Learn. 28, 2, 133--168.
[17]
Hovy, E. H., Marcus, M., Palmer, M., Ramshaw, L., and Weischedel, R. 2006. Ontonotes: The 90% solution. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 57--60.
[18]
Hwa, R. 2000. Sample selection for statistical grammar induction. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. 45--52.
[19]
Korner, C. and Wrobel, S. 2006. Multi-Class ensemble-based active learning. In Proceedings of the European Conference on Machine Learning (ECML). 687--694.
[20]
Kim, S. M. 2006. Identification, classification, and analysis of opinions on the Web. Ph.D. thesis, University of Southern California.
[21]
Jones, R. 2005. Learning to extract entities from labeled and unlabeled text. Ph.D. thesis, Carnegie Mellon University.
[22]
Laws, F. and Schütze, H. 2008. Stopping criteria for active learning of named entity recognition. In Proceedings of the 22nd International Conference on Computational Linguistics. 465--472.
[23]
Lee, Y. and Ng, H. 2002. An empirical evaluation of knowledge sources and learning algorithm for word sense disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 41--48.
[24]
Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of 17th ACM International Conference on Research and Development in Information Retrieval. 3--12.
[25]
Li, M. and Sethi, I. K. 2006. Confidence-Based active learning. IEEE Trans. Patt. Anal. Mach. Intell. 28, 8, 1251--1261.
[26]
Marcus, M., Santorini, B., and Marcinkiewicz, M. 1993. Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 2, 313--330.
[27]
McCallum, A. and Nigram, K. 1998a. Employing EM in pool-based active learning for text classification. In Proceedings of 15th International Conference on Machine Learning. 350--358.
[28]
McCallum, A. and Nigram, K. 1998b. A comparison of event models for naïve Bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization.
[29]
Ng, H. and Lee, H. 1996. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics. 40--47.
[30]
Ngai, G. and Yarowsky, D. 2000. Rule writing or annotation: Cost-Efficient resource usage for based noun phrase chunking. In Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics. 117--125.
[31]
Philpot, A., Hovy, E. H., and Pantel, P. 2005. The Omega ontology. In Proceedings of OntoLex Conference onOntologies and Lexical Resources. 59--66.
[32]
Platt, J. 1999. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Classifiers. 61--74.
[33]
Roth, D. and Small, K. 2008. Active learning for pipeline models. In Proceedings of the National Conference on Artificial Intelligence. 683--688.
[34]
Roy, N. and McCallum, A. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning. 441--448.
[35]
Schein, A. and Ungar, L. 2007. Active learning for logistic regression: An evaluation. Mach. Learn. 68, 235--265.
[36]
Schohn, G. and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning. 839--846.
[37]
Seung, H. S., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the 5th Annual ACM Conference on Computational Learning Theory. 287--294.
[38]
Shen, D., Zhang, J., Su, J., Zhou, G., and Tan, C. 2004. Multi-Criteria-Based active learning for named entity recognition. In Proceedings of the 42th Annual Meeting of the Association of Computational Linguistics.
[39]
Tang, M., Luo, X., and Roukos, S. 2002. Active learning for statistical natural language parsing. In Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics. 120--127.
[40]
Tong, S. and Koller, D. 2002. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45--66.
[41]
Tomanek, K., Wermter, J., and Hahn, U. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 486--495.
[42]
Thompson, C. A., Califf, M. E., and Mooney, R. J. 1999. Active learning for natural language parsing and information extraction. In Proceedings of the 16th International Conference on Machine Learning. 406--414.
[43]
Vlachos, A. 2008. A stopping criterion for active learning. Comput. Speech Lang. 22, 3, 295--312.
[44]
Wiebe, J., Breck, E., Buckley, C., Cardie, C., Davis, P., et al. 2003. Recognizing and organizing opinions expressed in the world press. In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering.
[45]
Zhu, J. and Hovy, E. H. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 783--790.
[46]
Zhu, J., Wang, H., and Hovy, E. H. 2008a. Learning a stopping criterion for active learning for word sense disambiguation and text classification. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 366--372.
[47]
Zhu, J., Wang, H., and Hovy, E. H. 2008b. Multi-Criteria-Based strategy to stop active learning for data annotation. In Proceedings of the 22nd International Conference on Computational Linguistics. 1129--1136.

Cited By

View all

Index Terms

  1. Confidence-based stopping criteria for active learning for data annotation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Speech and Language Processing
      ACM Transactions on Speech and Language Processing   Volume 6, Issue 3
      April 2010
      24 pages
      ISSN:1550-4875
      EISSN:1550-4883
      DOI:10.1145/1753783
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 April 2010
      Accepted: 01 February 2010
      Revised: 01 May 2009
      Received: 01 July 2008
      Published in TSLP Volume 6, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Active learning
      2. confidence estimation
      3. stopping criterion
      4. text classification
      5. uncertainty sampling
      6. word sense disambiguation

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)62
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 12 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media