skip to main content
10.1145/3624918.3625337acmconferencesArticle/Chapter ViewAbstractPublication Pagessigir-apConference Proceedingsconference-collections
research-article

Recommending Answers to Math Questions Based on KL-Divergence and Approximate XML Tree Matching

Published: 26 November 2023 Publication History

Abstract

Math is the science and study of quality, structure, space, and change. It seeks out patterns, formulates new conjectures, and establishes the truth by rigorous deduction from appropriately chosen axioms and definitions. The study of math makes a person better at solving problems. It gives someone skills that can use across other subjects and apply in different job roles. In the modern world, builders use math every day to do their work, since construction workers add, subtract, divide, multiply, and work with fractions. It is obvious that math is a major contributor to many areas of study. For this reason, math information retrieval (Math IR) deserves attention and recognition, since a reliable Math IR system helps users find relevant answers to math questions and benefits all math learners whenever they need help solve a math problem, regardless of the time and place. Moreover, Math IR systems enhance the learning experience of their users. In this paper, we present MaRec, a recommender system that retrieves and ranks math answers based on their textual content and embedded formulas in answering a math question. MaRec ranks a potential answer A given a math question Q by computing the (i) KL-divergence score on A and Q using their textual contents, and (ii) the subtree matching score of the math formulas in Q and A represented as XML trees. The design of MaRec is simple and easy to understand, since it solely relies on a probability model and an elegant tree-matching approach in ranking math answers. Conducted empirical studies show that MaRec significantly outperforms (i) three existing state-of-the-art MathIR systems based on an offline evaluation, and (ii) two top-of-the-line machine learning systems based on an online analysis.

References

[1]
P. Ahern. 2023. 27 Mind-Bottling SEO Stats for 2023 (+ Beyond). https://inter-growth.co/seo-stats/. Intergrowth.
[2]
N. Belkin, R. Oddy, and H. Brooks. 1982. ASK for Information Retrieval: Part I. Background and Theory. Journal of Documentation (1982).
[3]
S. Bhatia, D. Majumdar, and P. Mitra. 2011. Query Suggestions in the Absence of Query Logs. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 795–804.
[4]
David Blei, Andrew Ng, and Michael Jordan. 2001. Latent dirichlet allocation. Advances in neural information processing systems 14 (2001).
[5]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell. 2020. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.
[6]
Y. Bu, S. Zou, Y. Liang, and V. Veeravalli. 2018. Estimation of KL Divergence: Optimal Minimax Rate. IEEE Transactions on Information Theory 64, 4 (2018), 2648–2674.
[7]
The Nation’s REport Card. 2019. National Achievement-Level Results.
[8]
G. Cormack, C. Clarke, and S. Buettcher. 2009. Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 758–759.
[9]
W. Croft, D. Metzler, and T. Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison Wesley.
[10]
D. Carlisle and P. Ion and R. Miner. 2021. Mathematical Markup Language (MathML), Version 3.0, 2nd Edition. W3C. https://www.w3.org/ TR/2014/REC-MathML3-20140410/.
[11]
P. Dadure, P. Pakray, and S. Bandyopadhyay. 2022. Embedding and Generalization of Formula with Context in the Retrieval of Mathematical Information. King Saud University-Computer and Information Sciences 34, 9 (2022), 6624–6634.
[12]
K. Davila and R. Zanibbi. 2017. Layout and Semantics: Combining Representations for Mathematical Formula Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1165–1168.
[13]
J. Devlin, M. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018).
[14]
S. Dominich. 2001. Mathematical Foundations of Information Retrieval. Vol. 12. Springer Science & Business Media.
[15]
R. Fatima. 2012. Role of Mathematics in the Development of Society. National Meet on Celebration of National Year of Mathematics. Organized by NCERT, New Delhi 1 (2012), 12.
[16]
L. Fredrik. [n. d.]. xml.etree.ElementTree-The ElementTree XML API. https://github.com/python/cpython/tree/3.11/Lib/xml/etree/ElementTree.py.
[17]
P. Ginsparg. 2021. Lessons from arXiv’s 30 Years of Information Sharing. Nature Reviews Physics 3, 9 (2021), 602–603.
[18]
P. Gupta and V. Gupta. 2012. A Survey of Text Question Answering Techniques. International Journal of Computer Applications 53, 4 (2012).
[19]
X. Hu, L. Gao, X. Lin, Z. Tang, X. Lin, and J. Baker. 2013. Wikimirs: A Mathematical Information Retrieval System for Wikipedia. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital Libraries. 11–20.
[20]
B. Jansen, A. Spink, and T. Saracevic. 2000. Real Life, Real Users, and Real Needs: a Study and Analysis of User Queries on the Web. IPM 36, 2 (2000), 207–227.
[21]
B. Jones and M. Kenward. 2003. Design and Analysis of Cross-Over Trials, 2nd Ed.Chapman and Hall.
[22]
L. Kazmier. 2003. Schaum’s Outline of Business Statistics. McGraw-Hill.
[23]
M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer. 2019. Bart: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. arXiv preprint arXiv:1910.13461 (2019).
[24]
M. Líška, P. Sojka, and M. Ružička. 2015. Combining Text and Formula Queries in Math Information Retrieval: Evaluation of Query Results Merging Strategies. In Proceedings of NWSearch. 7–9.
[25]
X. Luo, A. Baranova, and J. Biegert. 2019. Problemsolver at Semeval-2019 Task 10: Sequence-to-Sequence Learning and Expression Trees. In Proceedings of the 13th International Workshop on Semantic Evaluation. 1292–1296.
[26]
C. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press Cambridge.
[27]
B. Mansouri, V. Novotnỳ, A. Agarwal, D. Oard, and R. Zanibbi. 2022. Third CLEF Lab on Answer Retrieval for Questions on Math (Working Notes Version). Proceedings of the CLEF 2022 (CEUR Working Notes) (2022).
[28]
B. Mansouri, S. Rohatgi, D. Oard, J. Wu, C. Giles, and R. Zanibbi. 2019. Tangent-CFT: An Embedding Model for Mathematical Formulas. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 11–18.
[29]
B. Miller and A. Youssef. 2003. Technical Aspects of the Digital Library of Mathematical Functions. Annals of Math. & AI 38, 1 (2003), 121–136.
[30]
Y. Ng, D. Fraser, B. Kassaie, and F. Tompa. 2021. Dowsing for Math Answers. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 201–212.
[31]
T. Nguyen, K. Chang, and S. Hui. 2012. A Math-Aware Search Engine for Math Question Answering System. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 724–733.
[32]
V. Novotnỳ, P. Sojka, M. Stefánik, and D. Lupták. 2020. Three is Better than One: Ensembling Math Information Retrieval Systems. In CLEF (Working Notes).
[33]
A. Pathak, P. Pakray, and A. Gelbukh. 2018. A Formula Embedding Approach to Math Information Retrieval. Computación y Sistemas 22, 3 (2018), 819–833.
[34]
S. Peng, K. Yuan, L. Gao, and Z. Tang. 2021. Mathbert: A Pre-Trained Model for Mathematical Formula Understanding. arXiv preprint arXiv:2105.00377 (2021).
[35]
S. Robertson and H. Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in IR 3, 4 (2009), 333–389.
[36]
L. Rozakis. 2002. Test Taking Strategies and Study Skills for the Utterly Confused. McGraw Hill.
[37]
M. Schubotz, A. Grigorev, M. Leich, H. Cohl, N. Meuschke, B. Gipp, A. Youssef, and V.Markl. 2016. Semantification of Identifiers in Mathematics for Better Math Information Retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 135–144.
[38]
P. Sojka and M. Líška. 2011. The Art of Mathematics Retrieval. In Proceedings of the 11th ACM Symposium on Document Engineering. 57–60.
[39]
R. Srihari and W. Li. 2000. A Question Answering System Supported by Information Extraction. In Sixth Applied Natural Language Processing Conference. 166–172.
[40]
D. Stalnaker. 2013. Math Expression Retrieval Using Symbol Pairs in Layout Trees. Master’s thesis. Rochester Institute of Technology.
[41]
Y. Stathopoulos and S. Teufel. 2016. Mathematical Information Retrieval Based on Type Embeddings and Query Expansion. In Proceedings of COLING. 2344–2355.
[42]
Public School View. 2023. Average Public School Math Proficiency. https://publicschoolreview.com/average-math-proficiency-stats/national-data.
[43]
Y. Wang, X. Liu, and S. Shi. 2017. Deep Neural Solver for Math Word Problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 845–854.
[44]
WebFX. 2022. 95 SEO Statistics from This Year That’ll Transform Your Strategy. https://www.webfx.com/seo/statistics/.
[45]
R. Zanibbi and D. Blostein. 2012. Recognition and Retrieval of Mathematical Expressions. Document Analysis and Recognition (IJDAR) 15, 4 (2012), 331–357.
[46]
R. Zanibbi and D. Blostein. 2012. Recognition and Retrieval of Mathematical Expressions. Document Analysis and Recognition (IJDAR) 15, 4 (2012), 331–357.
[47]
K. Zhang. 1996. A Constrained Edit Distance between Unordered Labeled Trees. Algorithmica 15, 3 (1996), 205–222.
[48]
Z. Zhang, T. Wang, X. Song, and Y. Wang. 2022. The Design and Implementation of the Natural Handwriting Mathematical Formula Recognition System. In Proceedings of the 6th International Conference on Advances in Image Processing. 114–121.
[49]
J. Zhao, M. Kan, and Y. Theng. 2008. Math Information Retrieval: User Requirements and Prototype Implementation. In Proceedings of the 8th ACM/IEEE-CS joint conference on Digital Libraries. 187–196.
[50]
W. Zhong, J. Yang, and J. Lin. 2022. Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval. arXiv preprint arXiv:2203.11163 (2022).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
November 2023
324 pages
ISBN:9798400704086
DOI:10.1145/3624918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. KL-divergence
  2. content similarity
  3. math questions and answers
  4. subtree matching

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGIR-AP '23
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media