skip to main content
article

The kappa statistic: a second look

Published: 01 March 2004 Publication History

Abstract

In recent years, the kappa coefficient of agreement has become the de facto standard for evaluating intercoder agreement for tagging tasks. In this squib, we highlight issues that affect κ and that the community has largely neglected. First, we discuss the assumptions underlying different computations of the expected agreement component of κ. Second, we discuss how prevalence and bias affect the κ measure.

References

[1]
Allen, James and Mark Core. 1997. DAMSL: Dialog act markup in several layers; Coding scheme developed by the participants at two discourse tagging workshops, University of Pennsylvania, March 1996, and Schloß Dagstuhl, February 1997. Draft.
[2]
Bartko, John J. and William T. Carpenter. 1976. On the methods and theory of reliability. Journal of Nervous and Mental Disease, 163(5):307-317.
[3]
Berry, Charles C. 1992. The κ statistic {letter to the editor}. Journal of the American Medical Association, 268(18):2513-2514.
[4]
Byrt, Ted, Janet Bishop, and John B. Carlin. 1993. Bias, prevalence, and kappa. Journal of Clinical Epidemiology, 46(5):423-429.
[5]
Carletta, Jean. 1996. Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics, 22(2):249-254.
[6]
Carletta, Jean, Amy Isard, Stephen Isard, Jacqueline C. Kowtko, Gwyneth Doherty-Sneddon, and Anne H. Anderson. 1997. The reliability of a dialogue structure coding scheme. Computational Lingustics, 23(1):13-31.
[7]
Cicchetti, Domenic V. and Alvan R. Feinstein. 1990. High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43(6):551-558.
[8]
Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37-46.
[9]
Di Eugenio, Barbara. 2000. On the usage of Kappa to evaluate agreement on coding tasks. In LREC2000: Proceedings of the Second International Conference on Language Resources and Evaluation, pages 441-444, Athens.
[10]
Di Eugenio, Barbara, Pamela W. Jordan, Richmond H. Thomason, and Johanna D. Moore. 2000. The agreement process: An empirical investigation of human-human computer-mediated collaborative dialogues. International Journal of Human Computer Studies, 53(6):1017-1076.
[11]
Fleiss, Joseph L. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378-382.
[12]
Goldman, Ronald L. 1992. The κ statistic {letter to the editor (in reply)}. Journal of the American Medical Association, 268(18):2513-2514.
[13]
Grove, William M., Nancy C. Andreasen, Patricia McDonald-Scott, Martin B. Keller, and Robert W. Shapiro. 1981. Reliability studies of psychiatric diagnosis: Theory and practice. Archives of General Psychiatry, 38:408-413.
[14]
Krippendorff, Klaus. 1980. Content Analysis: An Introduction to Its Methodology. Sage Publications, Beverly Hills, CA.
[15]
Rietveld, Toni and Roeland van Hout. 1993. Statistical Techniques for the Study of Language and Language Behaviour. Mouton de Gruyter, Berlin.
[16]
Scott, William A. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19:127-141.
[17]
Siegel, Sidney and N. John Castellan, Jr. 1988. Nonparametric statistics for the behavioral sciences. McGraw Hill, Boston.
[18]
Wiebe, Janyce M., Rebecca F. Bruce, and Thomas P. O'Hara. 1999. Development and use of a gold-standard data set for subjectivity classifications. In ACL99: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 246-253, College Park, MD.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computational Linguistics
Computational Linguistics  Volume 30, Issue 1
March 2004
149 pages
ISSN:0891-2017
EISSN:1530-9312
Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 March 2004
Published in COLI Volume 30, Issue 1

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media