research-article

Automatically Learning Topics and Difficulty Levels of Problems in Online Judge Systems

Authors:

Wayne Xin Zhao,

Ji-Rong WenAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 36, Issue 3

Article No.: 27, Pages 1 - 33

https://doi.org/10.1145/3158670

Published: 07 March 2018 Publication History

Abstract

Online Judge (OJ) systems have been widely used in many areas, including programming, mathematical problems solving, and job interviews. Unlike other online learning systems, such as Massive Open Online Course, most OJ systems are designed for self-directed learning without the intervention of teachers. Also, in most OJ systems, problems are simply listed in volumes and there is no clear organization of them by topics or difficulty levels. As such, problems in the same volume are mixed in terms of topics or difficulty levels. By analyzing large-scale users’ learning traces, we observe that there are two major learning modes (or patterns). Users either practice problems in a sequential manner from the same volume regardless of their topics or they attempt problems about the same topic, which may spread across multiple volumes. Our observation is consistent with the findings in classic educational psychology. Based on our observation, we propose a novel two-mode Markov topic model to automatically detect the topics of online problems by jointly characterizing the two learning modes. For further predicting the difficulty level of online problems, we propose a competition-based expertise model using the learned topic information. Extensive experiments on three large OJ datasets have demonstrated the effectiveness of our approach in three different tasks, including skill topic extraction, expertise competition prediction and problem recommendation.

References

[1]

Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2014. Engaging with massive online courses. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 687--698.

Digital Library

[2]

John R. Anderson, C. Franklin Boyle, and Brian J. Reiser. 1985. Intelligent tutoring systems. Science 228, 4698 (1985), 456--462.

[3]

Tiffany Barnes, Donald L. Bitzer, and Mladen A. Vouk. 2005. Experimental analysis of the Q-matrix method in knowledge discovery. In Proceedings of the 15th International Symposium on Foundations of Intelligent Systems (ISMIS’05). 603--611.

Digital Library

[4]

David M. Blei and John D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Stat. (2007), 17--35.

[5]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, Jan (2003), 993--1022.

[6]

John D. Bransford, Ann L. Brown, and Rodney R. Cocking. 2000. How People Learn: Brain, Mind, Experience, and School: Expanded Edition. National Academy Press, Washington, DC.

[7]

Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th International Conference on World Wide Web. ACM, 1--10.

Digital Library

[8]

Shuo Chen and Thorsten Joachims. 2016. Modeling intransitivity in matchup and comparison data. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 227--236.

Digital Library

[9]

Shuo Chen and Thorsten Joachims. 2016. Predicting matchups and preferences in context. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 775--784.

Digital Library

[10]

Xi Chen, Paul N. Bennett, Kevyn Collins-Thompson, and Eric Horvitz. 2013. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. ACM, 193--202.

Digital Library

[11]

Albert T. Corbett and John R. Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User-adapt. Interact. 4, 4 (1994), 253--278.

[12]

Ryan S. J d. Baker, Albert T. Corbett, and Vincent Aleven. 2008. More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing. In Proceedings of the International Conference on Intelligent Tutoring Systems. Springer, 406--415.

Digital Library

[13]

Pierre Dangauthier, Ralf Herbrich, Tom Minka, and Thore Graepel. 2008. Trueskill through time: Revisiting the history of chess. In Advances in Neural Information Processing Systems. 337--344.

Digital Library

[14]

Jimmy de la Torre. 2009. DINA model and parameter estimation: A didactic. J. Edu. Behav. Stat. 34, 1 (2009), 115--130.

[15]

Jimmy de la Torre and Jeffrey A. Douglas. 2004. Higher-order latent trait models for cognitive diagnosis. Psychometrika 69, 3 (2004), 333--353.

[16]

Michel Desmarais, Behzad Beheshti, and Peng Xu. 2014. The refinement of a Q-matrix: Assessing methods to validate tasks to skills mapping. In Proceedings of the 7th International Conference on Educational Data Mining (EDM’14). 208--311.

[17]

Michel C. Desmarais. 2012. Mapping question items to skills with non-negative matrix factorization. ACM SIGKDD Explor. Newslett. 13, 2 (2012), 30--36.

Digital Library

[18]

Michel C. Desmarais and Rhouma Naceur. 2013. A matrix factorization method for mapping items to skills and for enhancing expert-based q-matrices. In Proceedings of the International Conference on Artificial Intelligence in Education. Springer, 441--450.

[19]

Louis V. DiBello, William F. Stout, and Louis A. Roussos. 1995. Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. Cognitively Diagnostic Assessment (1995), 361--389.

[20]

S. E. Embretson and S. P. Reise. 2000. Item Response Theory for Psychologists. Lawrence Erlbaum, Mahwah.

[21]

Mark E. Glickman. 1995. A comprehensive guide to chess ratings. Amer. Chess J. 3 (1995), 59--102.

[22]

Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101, suppl. 1 (2004), 5228--5235.

[23]

Thomas L. Griffiths, Mark Steyvers, David M. Blei, and Joshua B. Tenenbaum. 2004. Integrating topics and syntax. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’04), Vol. 4. 537--544.

Digital Library

[24]

Amit Gruber, Yair Weiss, and Michal Rosen-Zvi. 2007. Hidden topic markov models. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’07), Vol. 2. 163--170.

[25]

Ido Guy, Uri Avraham, David Carmel, Sigalit Ur, Michal Jacovi, and Inbal Ronen. 2013. Mining expertise and interests from social media. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 515--526.

Digital Library

[26]

Edward H. Haertel. 1989. Using restricted latent class models to map the skill structure of achievement items. J. Edu. Measure. 26, 4 (1989), 301--321.

[27]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 173--182.

Digital Library

[28]

Ralf Herbrich, Tom Minka, and Thore Graepel. 2006. TrueSkill: A Bayesian skill rating system. In Proceedings of the 19th International Conference on Neural Information Processing Systems. MIT Press, 569--576.

Digital Library

[29]

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 50--57.

Digital Library

[30]

James A. Kulik and J. D. Fletcher. 2016. Effectiveness of intelligent tutoring systems: A meta-analytic review. Rev. Edu. Res. 86, 1 (2016), 42--78.

[31]

Andrew S. Lan, Christoph Studer, and Richard G. Baraniuk. 2014. Time-varying learning and content analytics via sparse factor analysis. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 452--461.

Digital Library

[32]

Bin Liu, Yanjie Fu, Zijun Yao, and Hui Xiong. 2013. Learning geographical preferences for point-of-interest recommendation. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1043--1051.

Digital Library

[33]

Jing Liu, Young-In Song, and Chin-Yew Lin. 2011. Competition-based user expertise score estimation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 425--434.

Digital Library

[34]

Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Found. Trends Info. Retriev. 3, 3 (2009), 225--331.

Digital Library

[35]

Shahriar Manzoor. 2006. Analyzing programming contest statistics. Perspect. Comput. Sci. Compet. Students, 48.

[36]

Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems. ACM, 165--172.

Digital Library

[37]

Andriy Mnih and Ruslan R. Salakhutdinov. 2008. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems. 1257--1264.

Digital Library

[38]

Paul D. Nichols, Susan F. Chipman, and Robert L. Brennan. 2012. Cognitively Diagnostic Assessment. Routledge.

[39]

Zachary A. Pardos and Neil T. Heffernan. 2010. Modeling individualization in a Bayesian networks implementation of knowledge tracing. In Proceedings of the International Conference on User Modeling, Adaptation, and Personalization. Springer, 255--266.

Digital Library

[40]

Jordi Petit, Omer Giménez, and Salvador Roura. 2012. Jutge. org: An educational programming judge. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education. ACM, 445--450.

Digital Library

[41]

Xuan-Hieu Phan and Cam-Tu Nguyen. 2007. GibbsLDA++: AC/C++ implementation of latent Dirichlet allocation (LDA). Technical report.

[42]

Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J. Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. In Advances in Neural Information Processing Systems. 505--513.

Digital Library

[43]

Chris Piech, Mehran Sahami, Jonathan Huang, and Leonidas Guibas. 2015. Autonomously generating hints by inferring problem solving policies. In Proceedings of the 2nd ACM Conference on Learning@ Scale. ACM, 195--204.

Digital Library

[44]

Chris Piech, Mehran Sahami, Daphne Koller, Steve Cooper, and Paulo Blikstein. 2012. Modeling how students learn to program. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education. ACM, 153--160.

Digital Library

[45]

Anna Rafferty, Emma Brunskill, Thomas Griffiths, and Patrick Shafto. 2011. Faster teaching by POMDP planning. In Artificial Intelligence in Education. Springer, 280--287.

Digital Library

[46]

Md Mustafizur Rahman and Hongning Wang. 2016. Hidden topic sentiment model. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 155--165.

Digital Library

[47]

P. V. Rao and Lawrence L. Kupper. 1967. Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. J. Amer. Statist. Assoc. 62, 317 (1967), 194--204.

[48]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 452--461.

Digital Library

[49]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web. ACM, 811--820.

Digital Library

[50]

Miguel A. Revilla, Shahriar Manzoor, and Rujia Liu. 2008. Competitive learning in informatics: The UVa online judge experience. Olympiads Informat. 2 (2008), 131--148.

[51]

Doug Rohrer. 2009. The effects of spacing and mixing practice problems. J. Res. Math. Edu. 40, 1 (2009), 4--17. http://www.jstor.org/stable/40539318

[52]

Doug Rohrer and Kelli Taylor. 2006. The effects of overlearning and distributed practise on the retention of mathematics knowledge. Appl. Cogn. Psychol. 20, 9 (2006), 1209--1224.

[53]

Cristobal Romero and Sebastian Ventura. 2007. Educational data mining: A survey from 1995 to 2005. Expert Syst. Appl. 33, 1 (2007), 135--146.

Digital Library

[54]

Alexander Joseph Romiszowski. 2016. Designing Instructional Systems: Decision Making in Course Planning and Curriculum Design. Routledge.

[55]

Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 487--494.

Digital Library

[56]

Steven V. Shannon. 2008. Using metacognitive strategies and learning styles to create self-directed learners. Inst. Learn. Styles J. 1, 1 (2008), 14--28.

[57]

Robert E. Slavin and Nicola Davis. 2006. Educational psychology: Theory and practice. (2006).

[58]

Yuan Sun, Shiwei Ye, Shunya Inoue, and Yi Sun. 2014. Alternating recursive method for Q-matrix learning. In Proceedings of the 7th International Conference on Educational Data Mining (EDM’14). 14--20.

[59]

Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2004. Sharing clusters among related groups: Hierarchical dirichlet processes. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’04). 1385--1392.

Digital Library

[60]

Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2015. Learning hierarchical representation model for nextbasket recommendation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 403--412.

Digital Library

[61]

Run-ze Wu, Qi Liu, Yuping Liu, Enhong Chen, Yu Su, Zhigang Chen, and Guoping Hu. 2015. Cognitive modelling for predicting examinee performance. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15). 1017--1024.

Digital Library

[62]

Jiang Yang, Lada A. Adamic, and Mark S. Ackerman. 2008. Competing to share expertise: The taskcn knowledge sharing community. In Proceedings of the International Conference on Web and Social Media (ICWSM’08).

[63]

Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, and Zhong Chen. 2013. Cqarank: Jointly model topics and expertise in community question answering. In Proceedings of the 22nd ACM International Conference on Information 8 Knowledge Management. ACM, 99--108.

Digital Library

[64]

Michael V. Yudelson, Kenneth R. Koedinger, and Geoffrey J. Gordon. 2013. Individualized bayesian knowledge tracing models. In Proceedings of the International Conference on Artificial Intelligence in Education. Springer, 171--180.

[65]

Wayne Xin Zhao, Jing Liu, Yulan He, Chin-Yew Lin, and Ji-Rong Wen. 2016. A computational approach to measuring the correlation between expertise and social media influence for celebrities on microblogs. World Wide Web 19, 5 (2016), 865--886.

Digital Library

[66]

Barry J. Zimmerman, Dale H. Schunk, Anita Woolfolk Hoy, and Pamela J. Gaskill. 2003. Self-regulated learning. Psyccritiques 48, 1 (2003), 16--18.

Cited By

Ma FZhu CLiu D(2024)A deeper knowledge tracking model integrating cognitive theory and learning behaviorJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23572346:3(6607-6617)Online publication date: 5-Mar-2024
https://doi.org/10.3233/JIFS-235723
Xia ZDong NWu JMa C(2024)Multivariate Knowledge Tracking Based on Graph Neural Network in ASSISTmentsIEEE Transactions on Learning Technologies10.1109/TLT.2023.330101117(32-43)Online publication date: 2024
https://doi.org/10.1109/TLT.2023.3301011
Long YYu WHuang JZhang TLai N(2024)MGKT: A Multi-Relation Enhanced Graph-Based Model for Knowledge Tracing2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10649935(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10649935
Show More Cited By

Index Terms

Automatically Learning Topics and Difficulty Levels of Problems in Online Judge Systems

Recommendations

Classification of Programming Problems based on Topic Modeling
ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education Technology

Programming skill is one of the most important and demanding skill in the current generation. In order to enable learners and programmers to practice programming and gain problem-solving skills, many Online Judge (OJ) systems exist. Most of these OJ ...
Generating contextualized sentiment lexica based on latent topics and user ratings
HT '13: Proceedings of the 24th ACM Conference on Hypertext and Social Media

Sentiment lexica are useful for analyzing opinions in Web collections, for domain-dependent sentiment classification, and as sub-components of recommender systems. In this paper, we present a strategy for automatically generating topic-dependent lexica ...
Modeling online reviews with multi-grain topic models
WWW '08: Proceedings of the 17th international conference on World Wide Web

In this paper we present a novel framework for extracting the ratable aspects of objects from online user reviews. Extracting such aspects is an important challenge in automatically mining product opinions from the web and in generating opinion-based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 36, Issue 3

July 2018

402 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/3146384

Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2018

Accepted: 01 November 2017

Revised: 01 August 2017

Received: 01 May 2017

Published in TOIS Volume 36, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Beijing Natural Science Foundation
National Natural Science Foundation of China
National Key Basic Research Program (973 Program) of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
847
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)5

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma FZhu CLiu D(2024)A deeper knowledge tracking model integrating cognitive theory and learning behaviorJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23572346:3(6607-6617)Online publication date: 5-Mar-2024
https://doi.org/10.3233/JIFS-235723
Xia ZDong NWu JMa C(2024)Multivariate Knowledge Tracking Based on Graph Neural Network in ASSISTmentsIEEE Transactions on Learning Technologies10.1109/TLT.2023.330101117(32-43)Online publication date: 2024
https://doi.org/10.1109/TLT.2023.3301011
Long YYu WHuang JZhang TLai N(2024)MGKT: A Multi-Relation Enhanced Graph-Based Model for Knowledge Tracing2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10649935(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10649935
Muepu DWatanobe Y(2024)From Code to Ratings: Converting Programming Data to Enhance Collaborative Filtering in Educational Online Judge SystemsIEEE Access10.1109/ACCESS.2024.352211812(196670-196687)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3522118
Wang JChen STang ZLin PWang Y(2024)Enhancing SQL programming education: addressing cheating challenges in online judge systemsEducation and Information Technologies10.1007/s10639-024-13228-3Online publication date: 10-Dec-2024
https://doi.org/10.1007/s10639-024-13228-3
Kim GKim SJang B(2023)Classification of mathematical test questions using machine learning on datasets of learning management system questionsPLOS ONE10.1371/journal.pone.028698918:10(e0286989)Online publication date: 18-Oct-2023
https://doi.org/10.1371/journal.pone.0286989
Liu SLiu SYang ZSun JShen XLi QZou RDu S(2023)Heterogeneous Evolution Network Embedding with Temporal Extension for Intelligent Tutoring SystemsACM Transactions on Information Systems10.1145/361782842:2(1-28)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3617828
Cui JChen ZZhou AWang JZhang W(2023)Fine-Grained Interaction Modeling with Multi-Relational Transformer for Knowledge TracingACM Transactions on Information Systems10.1145/358059541:4(1-26)Online publication date: 23-Mar-2023
https://dl.acm.org/doi/10.1145/3580595
Pereira FRodrigues LHenklain MFreitas HOliveira DCristea ACarvalho LIsotani SBenedict ADorodchi Mde Oliveira E(2023)Toward Human–AI Collaboration: A Recommender System to Support CS1 Instructors to Select Problems for Assignments and ExamsIEEE Transactions on Learning Technologies10.1109/TLT.2022.322412116:3(457-472)Online publication date: 1-Jun-2023
https://doi.org/10.1109/TLT.2022.3224121
Muepu DShirafuji AAmin MWatanobe Y(2023)Similar Problems Recommendation Model to Support Programming Education2023 11th International Conference on Information and Education Technology (ICIET)10.1109/ICIET56899.2023.10111135(199-203)Online publication date: 18-Mar-2023
https://doi.org/10.1109/ICIET56899.2023.10111135
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents