skip to main content
10.1145/1290144.1290162acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Information extraction from mathematical texts by means of natural language processing techniques

Published: 28 September 2007 Publication History

Abstract

Particularly with regard to the widespread use of the internet, the increasing amount of scientific publications creates new requirements for sophisticated information retrieval systems. The discovery of semantic annotation for describing mathematical texts themselves and the structure of the observed mathematical field is an important issue supporting such information retrieval systems. A lot of good statistical approaches for finding correlations in texts exist e.g. as used by Google. mArachna follows a different approach and uses natural language processing techniques to recover all the fine-grained information snippets within mathematical texts. The extracted information is stored in knowledge bases, creating a low-level ontology of mathematics. In this article we represent our further developments in this field and the technical implementation of the mArachna prototype.

References

[1]
Hilbert, D.: Die Grundlagen der Mathematik. Abhandlungen aus dem mathematischen Seminar der Hamburgischen Universität 6 (1928) 65--85
[2]
Bourbaki, N.: Die Architektur der Mathematik. Mathematiker über die Mathematik. Springer, Berlin, Heidelberg, New York (1974)
[3]
Jena: A Semantic Web Framework for Java. http://jena.sourceforge.net
[4]
The Apache Software Foundation: Apache Lucene. http://lucene.apache.org/
[5]
Gruber, T., Olsen, G.: An Ontology for Engineering Mathematics. Technical Report KSL-94-18, Stanford University (1994)
[6]
MBase, http://www.mathweb.org/mbase/
[7]
Franke, A. and Kohlhase, M.: MBase: Representing Knowledge and Context for the Integration of Mathematical Software Systems, JSymComputation, 23 (4):365--402, 2001
[8]
Baur, J.: Syntax und Semantik mathematischer Texte. Master's thesis, Universität des Saarlandes, Fachbereich Computerlinguistik (November 1999)
[9]
Müller, S.: TRALE. http://www.cl.uni-bremen.de/Software/Trale/index.html
[10]
Wolfram Research: MathWorld. http://mathworld.wolfram.com
[11]
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge, London (1998)
[12]
Fellbaum, C.: WordNet. http://wordnet.princeton.edu
[13]
GermaNet Team: GermaNet. http://www.sfs.unituebingen.de/lsd/english.html
[14]
Helbig, H.: Knowledge Representation and the Semantics of Natural Language. Springer, Berlin, Heidelberg, New York (2006)
[15]
Urban, J.: MoMM Fast Interreduction and Retrieval in Large Libraries of Formalized Mathematics. Internation Journal on Artificial Intelligence Tools (15(1)) (2006) 109--130
[16]
Urban, J.: MizarMode An Integrated Proof Assistance Tool for the Mizar Way of Formalizing Mathematics. Journal of Applied Logic (2005)
[17]
Urban, J.: XML-izing Mizar: Making Semantic Processing and Presentation of MML Easy, MKM2005 (2002)
[18]
Pinkall, M., Siekmann, J., Benzmüller, C., and Kruijff-Korbayova, I.: DIALOG. http://www.ags.uni-sb.de/~dialog/
[19]
Asperti, A.,.Padovani, L., Sacerdoti Coen, C. and Schena, I.: HELM and the Semantic Web. In Boulton, R. J., Jackson, P. B., eds.: Theorem Proving in Higher Order Logics, 14th International Conference, TPHOLs 2001, Edinburgh, Scotland, UK, September 3-6, 2001, Proceedings. Volume 2152 of Lecture Notes in Computer Science, Springer (2001)
[20]
Asperti, A., Zacchiroli, S.: Searching Mathematics on theWeb: State of the Art and Future Developments. In: Joint Proceedings of the ECM4 Satellite Conference on Electronic Publishing at KTH Stockholm, AMS - SM M Special Session, Houston / (2004)
[21]
Albayrak, S., Wollny, S., Varone, N., Lommatzsch, A., Milosevic, D.: Agent Technology for Personalized Information Filtering: The PIA-System. ACM Symposium on Applied Computing (2005)
[22]
Collier, N., K, T.: PIA-Core: Semantic Annotation through Example-based Learning, Third International Conference on Language Resources and Evaluation (May 2002) 1611--1614
[23]
The PIA Project: PIA. http://www.pia-services.de/
[24]
Jeschke, S.: Mathematik in Virtuellen Wissensräumen - IuK-Strukturen und IT-Technologien in Lehre und Forschung. PhD thesis, Technische Universität Berlin (2004)
[25]
Natho, N.: MARACHNA: Eine semantische Analyse der mathematischen Sprache für ein computergestütztes Information Retrieval. PhD thesis, Technische Universität Berlin (2005)
[26]
Grottke, S., Jeschke, S., Natho, N., Seiler, R.: mArachna: A Classification Scheme for Semantic Retrieval in eLearning Environments in Mathematics. Proceedings of the 3rd International Conference on Multimedia and ICTs in Education, June 7-10, 2005, Caceres/Spain (2005)
[27]
W3C: Web Ontology Language. http://www.w3c.org/2004/OWL
[28]
The TEI Consortium: Text Encoding Initiative. http://www.tei-c.org
[29]
W3C: MathML. http://www.w3.org/Math
[30]
Müller, S.: Deutsche Syntax deklarativ: Head-Driven Phrase Structure Grammar für das Deutsche. In: Linguistische Arbeiten, No. 394. Max Niemeyer Verlag, Tübingen (2005)
[31]
Mozilla Foundation: Rhino. http://www.mozilla.org/rhino/
[32]
W3C: SPARQL. http://www.w3.org/TR/rdf-sparql-query/
[33]
Wüst, R.: Mathematik für Physiker und Mathematiker, Bd.1. Wiley-VCH (2005)
[34]
AT&T Research: Graphviz. www.graphviz.org/

Index Terms

  1. Information extraction from mathematical texts by means of natural language processing techniques

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Emme '07: Proceedings of the international workshop on Educational multimedia and multimedia education
    September 2007
    138 pages
    ISBN:9781595937834
    DOI:10.1145/1290144
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 September 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automated generation of metadata
    2. natural language processing
    3. ontology engineering
    4. semantic annotation

    Qualifiers

    • Article

    Conference

    MM07
    MM07: The 15th ACM International Conference on Multimedia 2007
    September 28, 2007
    Bavaria, Augsburg, Germany

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 392
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media