skip to main content
article

What makes the differences: benchmarking XML database implementations

Published: 01 February 2005 Publication History

Abstract

XML is emerging as a major standard for representing data on the World Wide Web. Recently, many XML storage models have been proposed to manage XML data. In order to assess an XML database's abilities to deal with XML queries, several benchmarks have also been proposed, including XMark and XMach. However, no reported studies using those benchmarks were found that can provide users with insights on the impacts of a variety of storage models on XML query performance. In this article, we report our first set of results on benchmarking a set of XML database implementations using two XML benchmarks. The selected implementations represent a wide range of approaches, including RDBMS-based systems with document-independent and document-dependent XML-relational schema mapping approaches, and XML native engines based on an Object-Oriented Model and the Document Object Model. Comprehensive experiments were conducted to study relative performance of different approaches and the important issues that affect XML query performance, such as path expression query processing, effectiveness of various partitioning, label-path, and indexing structures.

References

[1]
Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. International Journal on Digital Libraries 1, 1, 68--88.
[2]
Böhme, T. and Rahm, E. 2001. XMach-1: A benchmark for XML data management. In Proceedings of Datenbanksysteme in Büro, Technik und Wissenschaft (BTW). Oldenburg, Germany, 264--273.
[3]
Berglund, A., Boag, S., Chamberlin, D., Fernandez, M. F., Kay, M., Robie, J., and Simeon, J. 2002. XML path language (XPath) 2.0. Tech. rep.
[4]
Boag, S., Chamberlin, D., Fernandez, M. F., Florescu, D., Robie, J., and Simeon, J. 2002. XQuery 1.0: An XML query language. In W3C Working Draft 16 August 2002.
[5]
Böhme, T. and Rahm, E. 2002. Multi-user evaluation of XML data management systems with xmach-1. In Proceedings of VLDB 2002 Workshop EEXTT Lecture Notes in Computer Science, vol. 2590. 148--159.
[6]
Bonifati, A. and Ceri, S. 2000. Comparative analysis of five XML query languages. SIGMOD Record 29, 1, 68--79.
[7]
Brassan, S., Lee, M. L., Li, Y. G., Lacroix, Z., and Nambiar, U. 2002. The XOO7 benchmark. In Proceedings of Very Large Data Bases 2002 Workshop EEXTT. (Lecture Notes in Computer Science Vol. 2590). 146--147.
[8]
Ceri, S., Comai, S., Damiani, E., Fraternali, P., Paraboschi, S., and Tanca, L. 1999. XML-GL: a graphical language for querying and restructuring XML documents. In Proceedings of the 8th International World Wide Web Conference. (Toronto, Canada). 1171--1187.
[9]
Chamberlin, D. D., Robie, J., and Florescu, D. 2000. Quilt: An XML query language for heterogeneous data sources. In WebDB (Informal Proceedings). 53--62.
[10]
Chien, S. Y., Vagena, Z., Zhang, D., Tsotras, V., and Zaniolo, C. 2002. Efficient structural joins on indexed XML documents. In Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong, China. 263--274.
[11]
Christophides, V., Abiteboul, S., Cluet, S., and Scholl, M. 1994. From structured documents to novel query facilities. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Minneapolis, MN. 313--324.
[12]
Deutsch, A., Fernandez, M., and Florescu, D. 1999. A query language for XML. In Proceedings of the 8th International World Wide Web Conference. Toronto, Canada. 1155--1169.
[13]
Deutsch, A., Fernandez, M. F., and Suciu, D. 1999. Storing semistructured data with STORED. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Philadelphia, PA. 431--442.
[14]
Fernandez, M. F., Florescu, D., Kang, J., Levy, A. Y., and Suciu, D. 1998. Catching the boat with strudel: Experiences with a web-site management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, WA. 414--425.
[15]
Florescu, D. and Kossmann, D. 1999. A performance evaluation of alternative mapping schemes for storing XML data in a relational database. Survey report.
[16]
Grust, T. 2002. Accelerating xpath location steps. In Proceedings of the ACM SIGMOD International Conference on Data Management of Data. 109--120.
[17]
Jiang, H., Lu, H., Wang, W., and Yu, J. X. 2002a. Path materialization revisited: An efficient storage model for XML data. In Proceedings of the 13th Australasian Database Conference (ADC2002). (Melbourne, Australia). 85--94.
[18]
Jiang, H., Lu, H., Wang, W., and Yu, J. X. 2002b. XParent: An efficient RDBMS-based XML database system. In Proceedings of the 18th International Conference on Data Engineering. (San Jose, CA). 335--336.
[19]
Kanda Runapongsa, Jignesh M. Patel, H. J. and Al-Khalifa, S. 2002. The Michigan benchmark: A microbenchmark for XML query processing systems. In Proceedings of Very Large Data Bases 2002 Workshop EEXTT. (Lecture Notes in Computer Science Vol. 2590). 160--161.
[20]
Kanne, C. C. and Moerkotte, G. 2000. Efficient storage of XML data. In Proceedings of the 16th International Conference on Data Engineering. (San Diego, CA). 198--198.
[21]
Kha, D. D., Yoshikawa, M., and Uemura, S. 2001. An XML indexing structure with relative region coordinate. In Proceedings of the 17th International Conference on Data Engineering. (Heidelberg, Germany). 313--320.
[22]
Lee, D. and Chu, W. W. 2000. Comparative analysis of six XML schema languages. SIGMOD Record 29, 3, 76--87.
[23]
Li, Q. and Moon, B. 2001. Indexing and querying XML data for regular path expressions. In Proceedings of the 27th International Conference on Very Large Data Bases. (Rome, Italy). 361--370.
[24]
Lu, H., Wang, G., Yu, G., Bao, Y., Lv, J., and Yu, Y. 2002. XBase: Making your gigabyte disk files queriable. In Proceedings of the ACM SIGMOD International Conference on Management of Data. (Madison, WI). 630--630.
[25]
Lv, J., Wang, G., Yu, J. X., Yu, G., Lu, H., and Sun, B. 2002. Performance evaluation of a Dom-based xml database: Storage, indexing, and query optimization. In Proceedings of the 3rd International Conference on Web-Age Information Management. (Lecture Notes in Computer Science Vol. 2419). 13--24.
[26]
McHugh, J., Abiteboul, S., Goldman, R., Quass, D., and Widom, J. 1997. Lore: A database management system for semistructured data. SIGMOD Record 26, 3, 54--66.
[27]
Schmidt, A., Kersten, M. L., Windhouwer, M., and Waas, F. 2000. Efficient relational storage and retrieval of XML documents. In International Workshop on the Web and Databases (Informal Proceedings). 47--52.
[28]
Schmidt, A., Waas, F., Kersten, M., Carey, M. J., Manolescu, I., and Busse, R. 2002. Xmark: A benchmark for XML data management. In Proceedings of the 28th International Conference on Very Large Data Bases. (Hong Kong, China). 974--985.
[29]
Schmidt, A., Waas, F., Kersten, M., Florescu, D., Manolescu, L., Carey, M. J., and Busse, R. 2001. The XML benchmark project. Tech. rep., CWI, Amsterdam, The Netherlands.
[30]
Shanmugasundaram, J., Tufte, K., Zhang, C., Gang, H., DeWitt, D. J., and Naughton, J. F. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the 25th International Conference on Very Large Data Bases. (Edinburgh, Scotland) UK. 302--314.
[31]
Srivastava, D., Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M., and Yuqing, W. U. 2002. Structural joins: A primitive for efficient XML query pattern matching. In Proceedings of the 18th International Conference on Data Engineering. (San Jose, CA). 141--152.
[32]
Tian, F., DeWitt, D. J., Chen, J., and Zhang, C. 2000. The design and performance evaluation of altervative XML storage strategies. Tech. rep., Computer Science Department, University of Wisconsin, Madison, WI.
[33]
W3C. Document object model (DOM). http://www.w3.org/DOM/.
[34]
Yao, B. B., Özsu, M. T., and Keenleyside, J. 2002. XBench---A family of benchmarks for XML DBMSs. In Proceedings of Very Large Data Bases 2002 Workshop EEXTT. (Lecture Notes in Computer Science Vol. 2590). 162--163.
[35]
Yoshikawa, M., Amagasa, T., Shimura, T., and Uemura, S. 2001. XRel: A path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans. Intern. Tech. 1, 1, 110--141.
[36]
Zhang, C., Naughton, J. F., DeWitt, D. J., Luo, Q., and Lohman, G. M. 2001. On supporting containment queries in relational database management systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. (Santa Barbara, CA). 425--436.
[37]
Zhou, A., Lu, H., Zheng, S., Liang, Y., Zhang, L., Ji, W., and Tian, Z. 2001. VXMLR: A visual XML-relational database system. In Proceedings of the 27th International Conference on Very Large Data Bases. (Rome, Italy). 719--720.

Cited By

View all

Index Terms

  1. What makes the differences: benchmarking XML database implementations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Internet Technology
    ACM Transactions on Internet Technology  Volume 5, Issue 1
    February 2005
    297 pages
    ISSN:1533-5399
    EISSN:1557-6051
    DOI:10.1145/1052934
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 February 2005
    Published in TOIT Volume 5, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. XML query processing
    2. XML storage model
    3. benchmark

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media