research-article

Open access

Scaling short-answer grading by combining peer assessment with algorithmic scoring

Authors:

Chinmay E. Kulkarni,

Richard Socher,

Michael S. Bernstein,

Scott R. KlemmerAuthors Info & Claims

L@S '14: Proceedings of the first ACM conference on Learning @ scale conference

Pages 99 - 108

https://doi.org/10.1145/2556325.2566238

Published: 04 March 2014 Publication History

Abstract

Peer assessment helps students reflect and exposes them to different ideas. It scales assessment and allows large online classes to use open-ended assignments. However, it requires students to spend significant time grading. How can we lower this grading burden while maintaining quality? This paper integrates peer and machine grading to preserve the robustness of peer assessment and lower grading burden. In the identify-verify pattern, a grading algorithm first predicts a student grade and estimates confidence, which is used to estimate the number of peer raters required. Peers then identify key features of the answer using a rubric. Finally, other peers verify whether these feature labels were accurately applied. This pattern adjusts the number of peers that evaluate an answer based on algorithmic confidence and peer agreement. We evaluated this pattern with 1370 students in a large, online design class. With only 54% of the student grading time, the identify-verify pattern yields 80-90% of the accuracy obtained by taking the median of three peer scores, and provides more detailed feedback. A second experiment found that verification dramatically improves accuracy with more raters, with a 20% gain over the peer-median with four raters. However, verification also leads to lower initial trust in the grading system. The identify-verify pattern provides an example of how peer work and machine learning can combine to improve the learning experience.

References

[1]

Professionals against machine scoring of student essays in high-stakes assessment (www.humanreaders.com).

[2]

Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 313--322.

Digital Library

[3]

Boud, D., and Brew, A. Enhancing learning through self assessment, vol. 1. Kogan Page London, 1995.

[4]

Burstein, J. The e-rater scoring engine: Automated essay scoring with natural language processing. Automated essay scoring: A cross-disciplinary perspective (2003), 113--121.

[5]

Burstein, J., Chodorow, M., and Leacock, C. Automated essay evaluation: the criterion online writing service. AI Magazine 25, 3 (2004), 27.

Digital Library

[6]

Carlson, P., and Berry, F. Calibrated peer review and assessing learning outcomes. In Frontiers in Education Conference, vol. 2, STIPES (2003).

[7]

Chen, H., and He, B. Automated essay scoring by maximizing human-machine agreement.

[8]

Chodorow, M., and Leacock, C. An unsupervised method for detecting grammatical errors. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, Association for Computational Linguistics (2000), 140--147.

Digital Library

[9]

Fast, E., Lee, C., Aiken, A., Bernstein, M., Koller, D., Smith, E., and Institute, K. Crowd-scale interactive formal reasoning and analytics.

[10]

Foltz, P. W., Laham, D., and Landauer, T. K. The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning 1, 2 (1999).

[11]

Grimes, D., and Warschauer, M. Utility in a fallible tool: A multi-site case study of automated writing evaluation. The Journal of Technology, Learning and Assessment 8, 6 (2010).

[12]

Heimerl, K., Gawalt, B., Chen, K., Parikh, T., and Hartmann, B. Communitysourcing: engaging local crowds to perform expert work via physical kiosks. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (2012), 1539--1548.

Digital Library

[13]

Hirschman, L., Breck, E., Light, M., Burger, J. D., and Ferro, L. Automated grading of short-answer tests. Intelligent Systems and their Applications, IEEE (2000), 31--37.

[14]

Hirschman, L., Light, M., Breck, E., and Burger, J. D. Deep read: A reading comprehension system. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, Association for Computational Linguistics (1999), 325--332.

Digital Library

[15]

Kahneman, D., Lovallo, D., and Sibony, O. Before you make that big decision. Harvard Business Review 89, 6 (2011), 50--60.

[16]

Karger, D. R., Oh, S., and Shah, D. Iterative learning for reliable crowdsourcing systems. In Advances in neural information processing systems (2011), 1953--1961.

[17]

Kulkarni, C., Wei, K. P., Le, H., Chia, D., Papadopoulos, K., Cheng, J., Koller, D., and Klemmer, S. R. Peer and self assessment in massive online classes. ACM Trans. on Computer-Human Interaction 20 (2013), Preprint.

Digital Library

[18]

Lintott, C. J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., Raddick, M. J., Nichol, R. C., Szalay, A., Andreescu, D., et al. Galaxy zoo: morphologies derived from visual inspection of galaxies from the sloan digital sky survey?. Monthly Notices of the Royal Astronomical Society 389, 3 (2008), 1179--1189.

[19]

Lovallo, D., and Sibony, O. The case for behavioral strategy. McKinsey Quarterly (2010), 30--43.

[20]

Nielsen, J. Usability inspection methods. In Conference companion on Human factors in computing systems, ACM (1994), 413--414.

Digital Library

[21]

Peng Dai, M. D., and Weld, S. Decision-theoretic control of crowd-sourced workflows. In In the 24th AAAI Conference on Artificial Intelligence (AAAI'10 (2010).

[22]

Piech, C., Huang, J., Chen, Z., Do, C., Ng, A., and Koller, D. Tuning peer grading. In Proceedings of the 6th International Conference on Educational Data Mining (2013).

[23]

Shermis, M. D., Garvan, C. W., and Diao, Y. The impact of automated essay scoring on writing outcomes. Online Submission (2008).

[24]

Sommers, N. Revision strategies of student writers and experienced adult writers. College composition and communication 31, 4 (1980), 378--388.

[25]

Wetzel, C. G., Wilson, T. D., and Kort, J. The halo effect revisited: Forewarned is not forearmed. Journal of Experimental Social Psychology (1981).

[26]

Winerip, M. Facing a robo-grader? just keep obfuscating mellifluously. New York Times (2013).

[27]

Yang, B., Sun, J.-T., Wang, T., and Chen, Z. Effective multi-label active learning for text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (2009), 917--926.

Digital Library

[28]

Yannakoudakis, H., Briscoe, T., and Medlock, B. A new dataset and method for automatically grading esol texts. In ACL (2011), 180--189.

Digital Library

[29]

Zhang, M. Contrasting automated and human scoring of essays. R & D Connections (2013).

Cited By

Jiang LBosch NJoyner DKim MWang XXia M(2024)Short answer scoring with GPT-4Proceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664685(438-442)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664685
Duncan ATang THuang YLuu JThakkar NJoyner DJoyner DKim MWang XXia M(2024)Forums, Feedback, and Two Kinds of AI: A Selective History of Learning @ ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664667(376-382)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664667
Alipour SElahimanesh SJahanzad SMohammadi IMorassafar PNeshaei STefagh M(2024)Improving Grading Fairness and Transparency with Decentralized Collaborative Peer AssessmentProceedings of the ACM on Human-Computer Interaction10.1145/36373508:CSCW1(1-24)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637350
Show More Cited By

Index Terms

Scaling short-answer grading by combining peer assessment with algorithmic scoring
1. Applied computing
  1. Education
    1. Collaborative learning
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Peer Grading Without Protest: The SPARK Approach to Summative Peer Assessment
SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1

As Computer Science Professors, we strive to construct courses that maximally support and contribute to student learning through carefully crafted in-class and out-of-class activities. There is evidence that homework enhances student learning, and that ...
Combining Online Learning & Assessment in synchronization form

The main purpose of this study is to investigate the effects of integration of online learning and assessment in synchronization form (OLASF) on students' learning performance. The study seeks to evaluate how the synchronization content with immediate ...
A Peer Grading Approach for Open-ended Programming Projects Based on Binary System and Swiss System
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

Peer grading is widely used in high education as effective active learning but still faces challenges. We present the peer grading approach for Open-ended Programming Projects based on the binary and Swiss systems. First, we design a grading ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

L@S '14: Proceedings of the first ACM conference on Learning @ scale conference

March 2014

234 pages

ISBN:9781450326698

DOI:10.1145/2556325

General Chair:
Mehran Sahami
Stanford University, USA
,
Program Chairs:
Armando Fox
University of California, Berkeley, USA
,
Marti A. Hearst
University of California, Berkeley, USA
,
Michelene T.H. Chi
Arizona State University, USA

Copyright © 2014 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

ACM Ed Board: ACM Ed Board

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2014

Check for updates

Author Tags

Qualifiers

Research-article

Conference

L@S 2014

Sponsor:

ACM Ed Board

L@S 2014: First (2014) ACM Conference on Learning @ Scale

March 4 - 5, 2014

Georgia, Atlanta, USA

Acceptance Rates

L@S '14 Paper Acceptance Rate 14 of 38 submissions, 37%;

Overall Acceptance Rate 117 of 440 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
1,388
Total Downloads

Downloads (Last 12 months)118
Downloads (Last 6 weeks)17

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang LBosch NJoyner DKim MWang XXia M(2024)Short answer scoring with GPT-4Proceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664685(438-442)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664685
Duncan ATang THuang YLuu JThakkar NJoyner DJoyner DKim MWang XXia M(2024)Forums, Feedback, and Two Kinds of AI: A Selective History of Learning @ ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664667(376-382)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664667
Alipour SElahimanesh SJahanzad SMohammadi IMorassafar PNeshaei STefagh M(2024)Improving Grading Fairness and Transparency with Decentralized Collaborative Peer AssessmentProceedings of the ACM on Human-Computer Interaction10.1145/36373508:CSCW1(1-24)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637350
Zhang LChen TZong YGao XStephenson BStone JBattestilli LRebelsky SShoop L(2024)A Peer Grading Approach for Open-ended Programming Projects Based on Binary System and Swiss SystemProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630767(1484-1490)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3626252.3630767
Thorne SMentzer NBartholomew SStrimel G(2024)A systematic literature review of student evaluation of peer exemplars and implications for design, Technology, and Engineering LearningInternational Journal of Technology and Design Education10.1007/s10798-023-09874-234:4(1441-1462)Online publication date: 16-Jan-2024
https://doi.org/10.1007/s10798-023-09874-2
Kashima HOyama SArai HMori J(2024)Trustworthy human computation: a surveyArtificial Intelligence Review10.1007/s10462-024-10974-157:12Online publication date: 12-Oct-2024
https://doi.org/10.1007/s10462-024-10974-1
Poonpon KManorom PChansanam W(2023)Exploring effective methods for automated essay scoring of non-native speakersContemporary Educational Technology10.30935/cedtech/1374015:4(ep475)Online publication date: 2023
https://doi.org/10.30935/cedtech/13740
Han YWu WLiang YZhang L(2023)Peer Grading Eliciting Truthfulness Based on AutograderIEEE Transactions on Learning Technologies10.1109/TLT.2022.321694616:3(353-363)Online publication date: 1-Jun-2023
https://doi.org/10.1109/TLT.2022.3216946
Liu Jde Kok E(2023)CrowdLearn: Open Sourcing Learner Feedback to Enable Open-Ended Assessments in MOOCs2023 IEEE Learning with MOOCS (LWMOOCS)10.1109/LWMOOCS58322.2023.10306150(1-7)Online publication date: 11-Oct-2023
https://doi.org/10.1109/LWMOOCS58322.2023.10306150
Zhang LZong YChen TMa YAn SZhang L(2023)A Peer Review Approach to Grading Projects in Computer Courses2023 IEEE Frontiers in Education Conference (FIE)10.1109/FIE58773.2023.10343176(1-7)Online publication date: 18-Oct-2023
https://doi.org/10.1109/FIE58773.2023.10343176
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents