skip to main content
10.1145/956750.956789acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Efficient elastic burst detection in data streams

Published: 24 August 2003 Publication History

Abstract

Burst detection is the activity of finding abnormal aggregates in data streams. Such aggregates are based on sliding windows over data streams. In some applications, we want to monitor many sliding window sizes simultaneously and to report those windows with aggregates significantly different from other periods. We will present a general data structure for detecting interesting aggregates over such elastic windows in near linear time. We present applications of the algorithm for detecting Gamma Ray Bursts in large-scale astrophysical data. Detection of periods with high volumes of trading activities and high stock price volatility is also demonstrated using real time Trade and Quote (TAQ) data from the New York Stock Exchange (NYSE). Our algorithm beats the direct computation approach by several orders of magnitude.

References

[1]
http://www.lanl.gov/milagro/,2003.
[2]
R. Agrawal, C. Faloutsos, and A. N. Swami. Efficient Similarity Search In Sequence Databases. In D. Lomet, editor, Proceedings of the 4th International Conference of Foundations of Data Organization and Algorithms (FODO), pages 69--84, Chicago, Illinois, 1993. Springer Verlag.
[3]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3--5, Madison, Wisconsin, USA, pages 55--68. ACM, 2002.
[4]
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - a new class of data management applications. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20--23, 2002, Hong Kong, China, 2002.
[5]
K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt, pages 111--122, 2000.
[6]
K.-P. Chan and A. W.-C. Fu. Efficient time series matching by wavelets. In Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, pages 126--133, 1999.
[7]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6--8, 2002, San Francisco, CA, USA. ACM/SIAM, 2002, pages 635--644, 2002.
[8]
R. Atkins et. al. (The Milagro Collaboration). Evidence for TeV emission from GRB 970417a. In Ap. J. Lett. 533, L119, 2000.
[9]
A. J. Smith for the Milagro Collaboration. A search for bursts of tev gamma rays with milagro. In Proceedings of the 27th International Cosmic Ray Conference (ICRC 2001), 07--15 August 2001, Hamburg, Germany, 2001.
[10]
V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Data evolution and monitoring. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, 2000.
[11]
J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. ACM SIGMOD International Conf. on Management of Data, 2001.
[12]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB 2001, pages 79--88. Morgan Kaufmann, 2001.
[13]
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 97--106. ACM Press, 2001.
[14]
H. V. Jagadish, N. Koudas, and S. Muthukrishnan. Mining deviants in a time series database. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK, pages 102--113. Morgan Kaufmann, 1999.
[15]
E. Keogh, S. Lonardi, and B. Y. Chiu. Finding surprising patterns in a time series database in linear time and space. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002. Edmonton, Alberta, Canada, pages 550--556, 2002.
[16]
J. Kleinberg. Bursty and hierarchical structure in streams. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002. Edmonton, Alberta, Canada, pages 91--101, 2002.
[17]
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In L. M. Haas and A. Tiwary, editors, SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2--4, 1998, Seattle, Washington, USA, pages 448--459, 1998.
[18]
H. Samet. The quadtree and related hierarchical data structures. ACM Computing Surveys, 16(2):187--260, 1984.
[19]
C. Shahabi, X. Tian, and W. Zhao. Tsa-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In 12th International Conference on Scientific and Statistical Database Management (SSDBM'00), July 26--28, 2000, Berlin, Germany, pages 55--68, 2000.
[20]
J. S. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadephia, Pennsylvania, USA, pages 193--204, 1999.
[21]
M. Wang, T. M. Madhyastha, N. H. Chan, S. Papadimitriou, and C. Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In ICDE 2002, 18th International Conference on Data Engineering, February 26-March 1, 2002, San Jose, California, 2002.
[22]
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20--23, 2002, Hong Kong, China, pages 358--369, 2002.

Cited By

View all

Index Terms

  1. Efficient elastic burst detection in data streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2003
    736 pages
    ISBN:1581137370
    DOI:10.1145/956750
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data stream
    2. elastic burst

    Qualifiers

    • Article

    Conference

    KDD03
    Sponsor:

    Acceptance Rates

    KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)49
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media