skip to main content
research-article
Public Access

Apex: automatic programming assignment error explanation

Published: 19 October 2016 Publication History

Abstract

This paper presents Apex, a system that can automatically generate explanations for programming assignment bugs, regarding where the bugs are and how the root causes led to the runtime failures. It works by comparing the passing execution of a correct implementation (provided by the instructor) and the failing execution of the buggy implementation (submitted by the student). The technique overcomes a number of technical challenges caused by syntactic and semantic differences of the two implementations. It collects the symbolic traces of the executions and matches assignment statements in the two execution traces by reasoning about symbolic equivalence. It then matches predicates by aligning the control dependences of the matched assignment statements, avoiding direct matching of path conditions which are usually quite different. Our evaluation shows that Apex is every effective for 205 buggy real world student submissions of 4 programming assignments, and a set of 15 programming assignment type of buggy programs collected from stackoverflow.com, precisely pinpointing the root causes and capturing the causality for 94.5% of them. The evaluation on a standard benchmark set with over 700 student bugs shows similar results. A user study in the classroom shows that Apex has substantially improved student productivity.

References

[1]
What is wrong with my binary search implementation? http: //stackoverflow.com/questions/21709124.
[2]
Dijkstra’s algorithm not working. http://stackoverflow. com/questions/14135999,.
[3]
Logical error in my implementation of dijkstra’s algorithm. http://stackoverflow.com/questions/10432682,.
[4]
Apex benchmarks. http://apexpub.altervista.org/.
[5]
Euclid algorithm incorrect results. http://stackoverflow. com/questions/16567505,.
[6]
Inverse function works properly, but if works after while loops it produces wrong answers. http://stackoverflow.com/ questions/22921661,.
[7]
Bug in my floyd-warshall c ++ implementation. http://st ackoverflow.com/questions/3027216.
[8]
Is this an incorrect implementation of kadane’s algorithm? http://stackoverflow.com/questions/22927720.
[9]
Knapsack algorithm for two bags. http://stackoverflow. com/questions/20255319,.
[10]
Is there something wrong with my knapsack. http://stac koverflow.com/questions/21360767,.
[11]
Incorrect result in matrix multiplication in c. http://stacko verflow.com/questions/15512963.
[12]
Merge sort implementation. http://stackoverflow.com/ questions/18141065.
[13]
Prims alghoritm. http://stackoverflow.com/question s/24145687.
[14]
What is wrong with this algorithm? http://stackoverflo w.com/questions/18794190,.
[15]
Project euler problem 4. http://stackoverflow.com/qu estions/7000168,.
[16]
Project euler 8, i don’t understand where i’m going wrong. http://stackoverflow.com/questions/23824570,.
[17]
Stackoverflow. http://www.stackoverflow.com.
[18]
Analysis: The exploding demand for computer science education, and why america needs to keep up. http://www.geekwire.com/2014/analysis-examini ng-computer-science-education-explosion/, 2014.
[19]
A. Adam and J.-P. Laurent. Laura, a system to debug student programs. Artificial Intelligence, 15(1):75–122, 1980.
[20]
C. Ansótegui, F. Didier, and J. Gabàs. Exploiting the structure of unsatisfiable cores in maxsat. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI ’15, pages 283–289. AAAI Press, 2015. ISBN 978-1-57735-738-4.
[21]
S. Artzi, J. Dolby, F. Tip, and M. Pistoia. Directed test generation for e ffective fault localization. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 49–60, New York, NY, USA, 2010.
[22]
ACM. ISBN 978-1-60558-823-0.
[23]
A. Banerjee, A. Roychoudhury, J. A. Harlie, and Z. Liang. Golden implementation driven software debugging. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10, pages 177– 186, New York, NY, USA, 2010. ACM. ISBN 978-1-60558- 791-2.
[24]
H. Cleve and A. Zeller. Locating causes of program failures. In Proceedings of the 27th International Conference on Software Engineering, ICSE ’05, pages 342–351, New York, NY, USA, 2005. ACM. ISBN 1-58113-963-2.
[25]
L. De Moura and N. Bjørner. Z3: An e fficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08 /ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag.
[26]
ISBN 3-540-78799-2, 978-3-540-78799-0.
[27]
A. Groce, S. Chaki, D. Kroening, and O. Strichman. Error explanation with distance metrics. International Journal on Software Tools for Technology Transfer, 8(3):229–247, June 2006. ISSN 1433-2779.
[28]
S. Gulwani, I. Radiˇcek, and F. Zuleger. Feedback generation for performance problems in introductory programming assignments. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’14, pages 41–51, New York, NY, USA, 2014. ACM.
[29]
D. S. Hirschberg. Algorithms for the longest common subsequence problem. Journal of ACM, 24(4):664–675, Oct. 1977. ISSN 0004-5411.
[30]
K. J. Ho ffman, P. Eugster, and S. Jagannathan. Semanticsaware trace analysis. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pages 453–464, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-392-1.
[31]
M. Jose and R. Majumdar. Cause clue clauses: Error localization using maximum satisfiability. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 437–446, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0663-8.
[32]
S. Kaleeswaran, V. Tulsian, A. Kanade, and A. Orso. Minthint: Automated synthesis of repair hints. In Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pages 266–276, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2756-5.
[33]
R. Könighofer and R. Bloem. Automated error localization and correction for imperative programs. In Proceedings of the International Conference on Formal Methods in Computer-Aided Design, FMCAD ’11, pages 91–100, Austin, TX, 2011.
[34]
FMCAD Inc. ISBN 978-0-9835678-1-3.
[35]
S. Lahiri, R. Sinha, and C. Hawblitzel. Automatic rootcausing for program equivalence failures in binaries. In Proceedings of the 27th International Conference on Computer Aided Verification, CAV’15, pages 362–379, Berlin, Heidelberg, 2015. Springer-Verlag. ISBN 978-3-319-21689-8.
[36]
S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV’12, pages 712–717, Berlin, Heidelberg, 2012. Springer-Verlag. ISBN 978-3-642- 31423-0.
[37]
A. Lakhotia, M. D. Preda, and R. Giacobazzi. Fast location of similar code fragments using semantic ’juice’. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, PPREW ’13, pages 5:1–5:6, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1857-0.
[38]
C. Le Goues, N. Holtschulte, E. K. Smith, Y. Brun, P. Devanbu, S. Forrest, and W. Weimer. The manybugs and introclass benchmarks for automated repair of c programs. IEEE Transactions on Software Engineering (TSE), 41(12):1236–1256, December 2015. ISSN 0098-5589.
[39]
B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI’03, 2003.
[40]
W. R. Murray. Automatic program debugging for intelligent tutoring systems. Computational Intelligence, 3(1):1–16, 1987.
[41]
G. C. Necula. Translation validation for an optimizing compiler. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI ’00, pages 83–94, New York, NY, USA, 2000. ACM. ISBN 1-58113-199-2.
[42]
H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. Semfix: Program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 772–781, Piscataway, NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3.
[43]
M. K. Ramanathan, A. Grama, and S. Jagannathan. Sieve: A tool for automatically detecting variations across program versions. In Proceedings of the 21st IEEE /ACM International Conference on Automated Software Engineering, ASE ’06, pages 241–252, Washington, DC, USA, 2006. IEEE Computer Society. ISBN 0-7695-2579-2.
[44]
C. J. V. Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979. ISBN 0408709294.
[45]
S. K. Sahoo, J. Criswell, C. Geigle, and V. Adve. Using likely invariants for automated software fault localization. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 139–152, New York, NY, USA, 2013.
[46]
ACM. ISBN 978-1-4503-1870-9.
[47]
R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 15–26, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2014-6.
[48]
W. N. Sumner and X. Zhang. Comparative causality: Explaining the di fferences between executions. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 272–281, Piscataway, NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3.
[49]
D. Weeratunge, X. Zhang, W. N. Sumner, and S. Jagannathan. Analyzing concurrency bugs using dual slicing. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 253–264, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-823-0.
[50]
A. Zeller. Isolating cause-e ffect chains from computer programs. In Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering, SIGSOFT ’02 /FSE- 10, pages 1–10, New York, NY, USA, 2002. ACM. ISBN 1-58113-514-9. Introduction Motivation Problem Formalization Design Phase (1): Iterative Instance Matching Phase (2): Residue Alignment Phase (3): Comparative Dependence Graph Construction, Slicing, and Feedback Generation Implementation and Evaluation Experiment with Real Student Submissions Experiment with stackoverflow.com Programs User Study Limitations Comparison with PMaxSat Experiment with IntroClass Benchmarks Related Work Conclusion

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 51, Issue 10
OOPSLA '16
October 2016
915 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3022671
Issue’s Table of Contents
  • cover image ACM Conferences
    OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
    October 2016
    915 pages
    ISBN:9781450344449
    DOI:10.1145/2983990
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2016
Published in SIGPLAN Volume 51, Issue 10

Check for updates

Author Tags

  1. Automated Feedback Generation
  2. Computer-Aided Education

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)21
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media