skip to main content
10.1145/872757.872809acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Stream processing of XPath queries with predicates

Published: 09 June 2003 Publication History

Abstract

We consider the problem of evaluating large numbers of XPath filters, each with many predicates, on a stream of XML documents. The solution we propose is to lazily construct a single deterministic pushdown automata, called the XPush Machine from the given XPath fllters. We describe a number of optimization techniques to make the lazy XPush machine more efficient, both in terms of space and time. The combination of these optimizations results in high, sustained throughput. For example, if the total number of atomic predicates in the filters is up to 200000, then the throughput is at least 0.5 MB/sec: it increases to 4.5 MB/sec when each fllter contains a single predicate.

References

[1]
A. Aho and M. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18:333--340, 1975.
[2]
M. Altinel and M. Franklin. Efficient filtering of XML documents for selective dissemination. In Proceedings of VLDB, 2000.
[3]
J. Cai, R. Paige, and R. Tarjan. More efficient bottom-up multi-pattern matching in trees. TCS, 106(1):21--60, 1992.
[4]
C. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient flltering of XML documents with XPath expressions. In Proceedings of ICDE, 2002.
[5]
A. Chandra, D. Kozen, and L. Stockmeyer. Alternation. In Journal of the ACM, pages 115--133, January 1981.
[6]
C. Chauve. Tree pattern matching for linear static terms. In Proceedings of the International Symposium on String Processing and Information Retrieval, volume 2476 of Lecture Notes in Computer Science, pages 160--169. Springer, 2002.
[7]
J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for internet databases. In Proceedings of SIGMOD, 2000.
[8]
J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In Proceedings of ICDE, 2002.
[9]
J. Clark. XML path language (XPath), 1999. http://www.w3.org/TR/xpath.
[10]
R. Cole, R. Hariharan, and P. Indyk. Tree pattern matching and subset matching in deterministic O(n log3 n)-time. In SODA, pages 245--254, 1999.
[11]
Y. Diao, P. Fischer, M. Franklin, and R. To. Yfilter: Efficient and scalable filtering of XML documents. In Proceedings of ICDE, 2002.
[12]
G. Gottlob, C. Koch, and R. Pichler. Efficient algorithm for processing XPath queries. In Proceedings of VLDB, 2002.
[13]
T. J. Green, G. Miklau, M. Onizuka, and D. Suciu. Processing XML streams with deterministic automata. In Proceedings of ICDT, 2003.
[14]
C. M. Hoffmann and M. J. O'Donnell. Pattern matching in trees. JACM, 29(1):68--95, 1982.
[15]
J. Hopcroft and J. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979.
[16]
P. Kilpelainen and H. Mannila. Ordered and unordered tree inclusion. SIAM Journal of Computing, 24(2):340--356, 1995.
[17]
B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML data on the web. In Proceedings of SIGMOD, 2001.
[18]
D. Olteanu, T. Kiesling, and F. Bry. An evaluation of regular path expressions with qualifiers against XML streams. In Proceedings of ICDE, 2003.
[19]
G. Rozenberg and A. Salomaa. Handbook of Formal Languages. Springer Verlag, 1997.
[20]
A. Snoeren, K. Conley, and D. Gifford. Mesh-based content routing using XML. In Proceedings of the 18th Symposium on Operating Systems Principles, 2001.
[21]
M. Thorup. Efficient preprocessing of simple binary pattern forests. Journal of Algorithms, 20(3):602--612, 1996.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data
June 2003
702 pages
ISBN:158113634X
DOI:10.1145/872757
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2003

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS03
Sponsor:

Acceptance Rates

SIGMOD '03 Paper Acceptance Rate 53 of 342 submissions, 15%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media