skip to main content
10.1145/2494266.2494319acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Recognising document components in XML-based academic articles

Published: 10 September 2013 Publication History

Abstract

Recognising textual structures (paragraphs, sections, etc.) provides abstract and more general mechanisms for describing documents independent of the particular semantics of specific markup schemas, tools and presentation stylesheets. In this paper we propose an algorithm that allows us to identify the structural role of each element in a set of homogeneous scientific articles stored as XML files.

References

[1]
ttwood, T. K., Kell, D. B., McDermott, P., Marsh, J., Pettifer, S. R., Thorne, D. (2010). Utopia Documents: linking scholarly literature with research data. In Bioinformatics, 26 (18): i568--i574.
[2]
arabucci, G., Cervone, L., Palmirani, M., Peroni, S., Vitali, F. (2009). Multi-layer markup and ontological structures in Akoma Ntoso. In Proceeding of the International Workshop on AI approaches to the complexity of legal systems II (AICOL-II): 133--149.
[3]
eck, J. (2010). Report from the Field: Central, an XML-based Archive of Life Sciences Journal Articles. In Proceedings of the International Symposium on XML for the Long Haul: Issues in the Long-term Preservation of XML.
[4]
e Waard, A. (2010). From Proteins to Fairytales: Directions in Semantic Publishing. In IEEE Intelligent Systems, 25 (2): 83--88.
[5]
i Iorio, A., Peroni, S., Poggi, F., Vitali, F. (2012). A first approach to the automatic recognition of structural patterns in XML documents. In Proceedings of the 2012 ACM symposium on Document Engineering (DocEng 2012): 85--94.
[6]
eng, J., Yang, J. (2004). AUTOBIB: Automatic Extraction of Bibliographic Information on the Web. In Proceedings of the 2004 International Database Engineering and Applications Symposium (IDEAS04): 193--204.
[7]
ickson, I. (2011). HTML5: A vocabulary and associated APIs for HTML and XHTML. W3C Working Draft 25 May 2011. World Wide Web Consortium. http://www.w3.org/TR/html5/ (last visited May 26, 2013).
[8]
uong, M., Nguyen, T. D., Kan, M. (forthcoming) Logical Structure Recovery in Scholarly Articles with Rich Document Features. In International Journal of Digital Library Systems, 1 (4): 1--23.
[9]
eroni, S. (2012). Semantic Publishing: issues, solutions and new trends in scholarly publishing within the Semantic Web era. Ph. D. Thesis. Department of Computer Science, University of Bologna, Italy. http://speroni.web.cs.unibo.it/publications/peroni-2012-semantic-publishing-issues.pdf
[10]
hotton, D. (2009). Semantic Publishing: the coming revolution in scientific journal publishing. Learned Publishing, 22 (2): 85--94.
[11]
hotton, D., Portwin, K., Klyne, G., Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article. PLoS Computational Biology, 5 (4): e1000361.
[12]
alsh, N. (2010). DocBook 5: The Definitive Guide. Sebastopol, CA, USA: O'Really Media. Version 1.0.3. ISBN: 0596805029.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '13: Proceedings of the 2013 ACM symposium on Document engineering
September 2013
582 pages
ISBN:9781450317894
DOI:10.1145/2494266
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. doco
  2. document components
  3. xml

Qualifiers

  • Short-paper

Conference

DocEng '13
Sponsor:
DocEng '13: ACM Symposium on Document Engineering 2013
September 10 - 13, 2013
Florence, Italy

Acceptance Rates

DocEng '13 Paper Acceptance Rate 16 of 50 submissions, 32%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media