skip to main content
10.1145/3368308.3415382acmconferencesArticle/Chapter ViewAbstractPublication PagesiteConference Proceedingsconference-collections
research-article

Interpretable Deep Learning for University Dropout Prediction

Published: 07 October 2020 Publication History

Abstract

The early identification of college students at risk of dropout is of great interest and importance all over the world, since the early leaving of higher education is associated with considerable personal and social costs. In Hungary, especially in STEM undergraduate programs, the dropout rate is particularly high, much higher than the EU average. In this work, using advanced machine learning models such as deep neural networks and gradient boosted trees, we aim to predict the final academic performance of students at the Budapest University of Technology and Economics. The dropout prediction is based on the data that are available at the time of enrollment. In addition to the predictions, we also interpret our machine learning models with the help of state-of-the-art interpretable machine learning techniques such as permutation importance and SHAP values. The accuracy and AUC of the best-performing deep learning model are 72.4% and 0.771, respectively that slightly outperforms XGBoost, the cutting-edge benchmark model for tabular data.

References

[1]
Ghadeer S Abu-Oda and Alaa M El-Halees. 2015. Data mining in higher education: university student dropout case study. International Journal of Data Mining & Knowledge Management Process, Vol. 5, 1 (2015), 15.
[2]
Francesco Agrusti, Mauro Mezzini, and Gianmarco Bonavolontà. 2020. Deep learning approach for predicting university dropout: a case study at Roma Tre University. Journal of e-Learning and Knowledge Society, Vol. 16, 1 (2020), 44--54.
[3]
R. Alkhasawneh and R. Hobson. 2011. Modeling student retention in science and engineering disciplines using neural networks. In Proceedings of the 2011 IEEE Global Engineering Education Conference (EDUCON). IEEE, Amman, Jordan, 660--663.
[4]
Sercan O. Arik and Tomas Pfister. 2019. TabNet: Attentive Interpretable Tabular Learning. arxiv: 1908.07442 [cs.LG]
[5]
Raheela Asif, Agathe Merceron, Syed Abbas Ali, and Najmi Ghani Haider. 2017. Analyzing undergraduate students' performance using educational data mining. Computers & Education, Vol. 113 (2017), 177--194.
[6]
Lovenoor Aulck, Dev Nambi, Nishant Velagapudi, Joshua Blumenstock, and Jevin West. 2019. Mining University Registrar Records to Predict First-Year Undergraduate Attrition. In Proceedings of the 12th International Conference on Educational Data Mining. EDM, Montréal, Canada, 9--18.
[7]
Cédric Beaulac and Jeffrey S Rosenthal. 2019. Predicting university Students? academic success and major using random forests. Research in Higher Education, Vol. 60, 7 (2019), 1048--1064.
[8]
Leo Breiman. 2001. Random forests. Machine learning, Vol. 45, 1 (2001), 5--32.
[9]
Gerben Dekker, Mykola Pechenizkiy, and Jan Vleeshouwers. 2009. Predicting Students Drop Out: A Case Study. In Proceedings of the 2nd International Conference on Educational Data Mining. EDM, Cordoba, Spain, 41--50.
[10]
Ashish Dutt, Maizatul Akmar Ismail, and Tutut Herawan. 2017. A systematic review on educational data mining. IEEE Access, Vol. 5 (2017), 15991--16005.
[11]
David S Fike and Renea Fike. 2008. Predictors of first-year student retention in the community college. Community College Review, Vol. 36, 2 (2008), 68--88.
[12]
Organisation for Economic Co-operation and Development Staff. 2013. Education at a glance: OECD indicators 2013. OECD Publishing, Paris, France.
[13]
Antonio Hernández-Blanco, Boris Herrera-Flores, David Tomás, and Borja Navarro-Colorado. 2019. A Systematic Review of Deep Learning Approaches to Educational Data Mining. Complexity, Vol. Special Issue (May 2019), 1--22.
[14]
Ulrich Heublein. 2014. Student drop-out from German higher education institutions. European Journal of Education, Vol. 49, 4 (2014), 497--513.
[15]
Paul T. Von Hippel and Alvaro Hofflinger. 2020. The data revolution comes to higher education: identifying students at risk of dropout in Chile. Journal of Higher Education Policy and Management, Vol. 0, 0 (2020), 1--22.
[16]
Byung-Hak Kim, Ethan Vizitei, and Varun Ganapathi. 2018. GritNet: Student Performance Prediction with Deep Learning. arxiv: 1804.07405 [cs.LG]
[17]
Botond Kiss, Marcell Nagy, Roland Molontay, and Csabay Bálint. 2019. Predicting Dropout Using High School and First-semester Academic Achievement Measures. In Proceedings of the 17th International Conference on Emerging eLearning Technologies and Applications (ICETA). IEEE, High Tatras, Slovakia, 383--389.
[18]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-Normalizing Neural Networks. arxiv: 1706.02515 [cs.LG]
[19]
Mukesh Kumar, AJ Singh, and Disha Handa. 2017. Literature Survey on Educational Dropout Prediction. International Journal of Education and Management Engineering, Vol. 2 (2017), 8--19.
[20]
Jong Yih Kuo, Chia Wei Pan, and Baiying Lei. 2017. Using Stacked Denoising Autoencoder for the Student Dropout Prediction. In Proceedings of the 2017 IEEE International Symposium on Multimedia (ISM). IEEE, Taichung, Taiwan, 483--488.
[21]
A Latif, AI Choudhary, and AA Hammayun. 2015. Economic effects of student dropouts: A comparative study. Journal of Global Economics, Vol. 3 (2015), 4.Issue 2.
[22]
Scott Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. arxiv: 1705.07874 [cs.AI]
[23]
Ioanna Lykourentzou, Ioannis Giannoukos, Vassilis Nikolopoulos, George Mpardis, and Vassili Loumos. 2009. Dropout prediction in e-learning courses through the combination of machine learning techniques. Computers & Education, Vol. 53, 3 (2009), 950--965.
[24]
Cindi Mason, Janet Twomey, David Wright, and Lawrence Whitman. 2018. Predicting engineering student attrition risk using a probabilistic neural network and comparing results with a backpropagation neural network and logistic regression. Research in Higher Education, Vol. 59, 3 (2018), 382--400.
[25]
Christoph Molnar. 2020. Interpretable Machine Learning .Leanpub, Victoria, Canada. https://christophm.github.io/interpretable-ml-book/.
[26]
Marcell Nagy and Roland Molontay. 2018. Predicting Dropout in Higher Education Based on Secondary School Performance. In Proceedings of the 22nd International Conference on Intelligent Engineering Systems. IEEE, Las Palmas de Gran Canaria, Spain, 389--394.
[27]
Marcell Nagy, Roland Molontay, and Mihály Szabó. 2019. A Web Application for Predicting Academic Performance and Identifying the Contributing Factors. In Proceedings of the 47th Annual Conference of SEFI. Budapest University of Technology and Economics, Budapest, Hungary, 1794--1806.
[28]
Mark Plagge. 2013. Using artificial neural networks to predict first-year traditional students second year retention rates. In Proceedings of the 51st ACM Southeast Conference. ACM, New York, NY, United States, 5.
[29]
Sergei Popov, Stanislav Morozov, and Artem Babenko. 2019. Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data. arxiv: 1909.06312 [cs.LG]
[30]
Max Pumperla. 2019. Hyperas. https://github.com/maxpumperla/hyperas .
[31]
Neal Raisman. 2013. The Cost of College Attrition at Four-Year Colleges & Universities. Policy Perspectives. Technical Report. Educational Policy Institute.
[32]
S Ranjeeth, TP Latchoumi, and P Victer Paul. 2020. A Survey on Predictive Models of Learning Analytics. Procedia Computer Science, Vol. 167 (2020), 37--46.
[33]
Juan L Rastrollo-Guerrero, Juan A Gómez-Pulido, and Arturo Durán-Dom'inguez. 2020. Analyzing and Predicting Students? Performance by Means of Machine Learning: A Review. Applied Sciences, Vol. 10, 3 (2020), 1042.
[34]
KP Shaleena and Shaiju Paul. 2015. Data mining techniques for predicting student performance. In Proceedings of the 2015 IEEE International Conference on Engineering and Technology (ICETECH). IEEE, Coimbatore, India, 1--3.
[35]
Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games, Vol. 2, 28 (1953), 307--317.
[36]
Ira Shavitt and Eran Segal. 2018. Regularization Learning Networks: Deep Learning for Tabular Datasets. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 1386--1396.
[37]
Erik vS trumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, Vol. 41, 3 (2014), 647--665.
[38]
Johan J Vossensteyn, Andrea Kottmann, Benjamin WA Jongbloed, Franciscus Kaiser, Leon Cremonini, Bjorn Stensaker, Elisabeth Hovdhaugen, and Sabine Wollscheid. 2015. Dropout and completion in higher education in Europe: Main report. Technical Report. CHEPS and NIFU .
[39]
Wanli Xing and Dongping Du. 2019. Dropout prediction in MOOCs: Using deep learning for personalized intervention. Journal of Educational Computing Research, Vol. 57, 3 (2019), 547--570.

Cited By

View all

Index Terms

  1. Interpretable Deep Learning for University Dropout Prediction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGITE '20: Proceedings of the 21st Annual Conference on Information Technology Education
      October 2020
      446 pages
      ISBN:9781450370455
      DOI:10.1145/3368308
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 October 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dropout prediction
      2. explainable artificial intelligence
      3. higher education
      4. interpretable machine learning
      5. neural networks

      Qualifiers

      • Research-article

      Funding Sources

      • Ministry of Human Capacities

      Conference

      SIGITE '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 176 of 429 submissions, 41%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)122
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 01 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media