skip to main content
10.1145/1096601.1096647acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Enhancing composite digital documents using XML-based standoff markup

Published: 02 November 2005 Publication History

Abstract

Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging from abstract structure to detailed rendering and layout.We present a composite document approach wherein an XML-based document representation is linked via a 'shadow tree' of bi-directional pointers to a PDF representation of the same document. Using a two-window viewer any material selected in the PDF can be related back to the corresponding material in the XML, and vice versa. In this way the treatment of specialist material such as mathematics, music or chemistry (e.g. via 'read aloud' or 'play aloud') can be activated via standard tools working within the XML representation, rather than requiring that application-specific structures be embedded in the PDF itself.The problems of textual recognition and tree pattern matching between the two representations are discussed in detail.Comparisons are drawn between our use of a shadow tree of pointers to map between document representations and the use of a code-replacement shadow tree in technologies such as XBL.

References

[1]
Matthew R B Hardy and David F Brailsford, "Mapping and Displaying Structural Transformations between XML and PDF," in Proceedings of the ACM Symposium on Document Engineering (DocEng'02), pp. 95--102, ACM Press, 8-9 November 2002.
[2]
Matthew Hardy, David Brailsford, and Peter Thomas, "Creating structured PDF files using XML templates," in Proceedings of the ACM Symposium on Document Engineering (DocEng'04), pp. 99--108, ACM Press, 27-31 October 2004.
[3]
Adobe Systems Inc, PDF Reference (Third Edition; PDF 1.4), Addison Wesley, 2002. ISBN 0201758393.
[4]
OpenDoc Programmers' Guide, Addison Wesley Publishing Company, 1995. ISBN 0-202-47954-0.
[5]
Heinz Fanderl, Kristian Fischer and Jurgen Kamper, "The Open Document Architecture: from standardization to the market - Technical" IBM Systems Journal December 1992.
[6]
Thomas A. Phelps and Robert Wilensky, "The Multivalent Browser: A Platform for New Ideas," in Proceedings of the ACM Symposium on Document Engineering (DocEng'01), pp. 58--67, ACM Press, 9-10 November 2001. Atlanta, Georgia.
[7]
David F. Brailsford, "Separable Hyperstructure and Delayed Link Binding," ACM Computing Surveys, vol. 31, no. 4es, December 1999.
[8]
Henry S. Thompson and David McKelvie, "Hyperlink semantics for standoff markup of read-only documents," in Proceedings of SGML Europe 1997, May 1997. Barcelona, Spain.
[9]
Jung Ding and Daniel Berleant, "Design of a Standoff Object-Oriented Markup Language (SOOML) for Annotating Biomedical Literature," in Proceedings of 7th International Conference on Enterprise Information Systems (ICEIS), May 24-28, 2005. Miami.
[10]
Steven DeRose, "Markup Overlap: A Review and a Horse," in Proceedings of Conference on Extreme Markup Languages, 2004.
[11]
XBL W3C Note. http://www.w3.org/TR/2001/NOTE-xbl-20010223/
[12]
W3C Comment on XBL Submission. http://www.w3.org/Submission/2001/05/Comment http://www.w3.org/Submission/2001/05/Comment
[13]
S-XBL Working Draft. http://www.w3.org/TR/sXBL/
[14]
Adobe Systems Incorporated, Acrobat Core API Reference., 2002. San Jose, CA: Adobe Systems Incorporated.
[15]
W. S. Lovegrove and D. F. Brailsford, " Document analysis of PDF documents: methods, results and implications." Electronic Publishing, Origination, Dissemination and Design. 1995, 8(2 and 3), pp. 207--220.
[16]
Karin Hadjar, Maurizio Rigamonte, Denis Lalanne and Rolf Ingold "Xed: a new tool for eXtracting hidden structures from Electronic Documents" Proceedings Document Image Analysis for Libraries 2004, Palo Alto, California, January 2004, pp. 212--221.
[17]
F. M. Wahl, K. Y. Wong, and R. G. Casey, "Block segmentation and text extraction in mixed text/image documents" Computer Graphics Image Processing, vol. 20, pp. 375--390., 1982.
[18]
Text Encoding Initiative Consortium, TEI Workgroup on Stand-Off Markup, XLink and XPointer {online}, October 2004. http://www.tei-c.org/Activities/SO/
[19]
World Wide Web Consortium, XML Inclusions (XInclude) Version 1.0 {online}, December 2004.Available at: http://www.w3.org/TR/xinclude/
[20]
Unicode Consortium, The Unicode Standard: Worldwide Character Encoding, Version 1.0., Addison Wesley, 1991. Vols. 1 & 2.
[21]
World Wide Web Consortium, Mathematical Markup Language (MathML) Version 2.0 (2nd ed.) {online}. Available at: http://www.w3.org/TR/MathML2/
[22]
Recordare, MusicXML Definition {online}. Available at: http://www.recordare.com/xml.html
[23]
M. Suzuki, F. Tamari, R. Fukuda, S. Uchida, and T. Kanahori, "INFTY-An Integrated OCR System for Mathematics Documents," in Proceedings of the ACM Symposium on Document Engineering (DocEng'03), pp. 95--104, ACM Press, 20-22 November 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering
November 2005
252 pages
ISBN:1595932402
DOI:10.1145/1096601
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MathML
  2. MusicXML
  3. PDF
  4. XBL
  5. XML
  6. composite documents
  7. standoff markup

Qualifiers

  • Article

Conference

DocEng05
Sponsor:
DocEng05: ACM Symposium on Document Engineering
November 2 - 4, 2005
Bristol, United Kingdom

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media