skip to main content
10.1145/1379272.1379286acmotherconferencesArticle/Chapter ViewAbstractPublication PagessspsConference Proceedingsconference-collections
research-article

Designing an inductive data stream management system: the stream mill experience

Published: 29 March 2008 Publication History

Abstract

There has been much recent interest in on-line data mining. Existing mining algorithms designed for stored data are either not applicable or not effective on data streams, where real-time response is often needed and data characteristics change frequently. Therefore, researchers have been focusing on designing new and improved algorithms for on-line mining tasks, such as classification, clustering, frequent itemsets mining, pattern matching, etc. Relatively little attention has been paid to designing DSMSs, which facilitate and integrate the task of mining data streams---i.e., stream systems that provide Inductive functionalities analogous to those provided by Weka and MS OLE DB for stored data. In this paper, we propose the notion of an Inductive DSMS---a system that besides providing a rich library of inter-operable functions to support the whole mining process, also supports the essentials of DSMS, including optimization of continuous queries, load shedding, synoptic constructs, and non-stop computing. Ease-of-use and extensibility are additional desiderata for the proposed Inductive DSMS. We first review the many challenges involved in realizing such a system and then present our approach of extending the Stream Mill DSMS toward that goal. Our system features (i) a powerful query language where mining methods are expressed via aggregates for generic streams and arbitrary windows, (ii) a library of fast and light mining algorithms, and (iii) an architecture that makes it easy to customize and extend existing mining methods and introduce new ones.

References

[1]
Atlas user manual. http://wis.cs.ucla.edu/atlas.]]
[2]
DB2 Universal Database http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp.]]
[3]
Decision Tree Entropy Calculation http://decisiontrees.net/? q=node/27.]]
[4]
IBM. DB2 Intelligent Miner http://www-306.ibm.com/software/data/iminer.]]
[5]
ORACLE. Oracle Data Miner Release 10gr2 http://www.oracle.com/technology/products/bi/odm.]]
[6]
A. Arasu, S. Babu, and J. Widom. Cql: A language for continuous queries over streams and relations. In DBPL, pages 1--19, 2003.]]
[7]
Arvind Arasu and Jennifer Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, pages 336--347, 2004.]]
[8]
B. Babcock, S. Babu, M. Datar, R. Motawani, and J. Widom. Models and issues in data stream systems. In PODS, 2002.]]
[9]
Yijian Bai, Hetal Thakkar, Chang Luo, Haixun Wang, and Carlo Zaniolo. A data stream language and system designed for power and extensibility. In CIKM, pages 337--346, 2006.]]
[10]
Toon Calders, Bart Goethals, and Adriana Prado. Integrating pattern mining in relational databases. In PKDD, volume 4213 of Lecture Notes in Computer Science, pages 454--461. Springer, 2006.]]
[11]
W. Cheung and O. R. Zaiane. Incremental mining of frequent patterns without candidate generation or support. In DEAS, 2003.]]
[12]
Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz. Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM '04), November 2004.]]
[13]
F. Chu and C. Zaniolo. Fast and light boosting for adaptive mining of data streams. In PAKDD, volume 3056, 2004.]]
[14]
Weka 3: data mining with open source machine learning software in java. http://www.cs.waikato.ac.nz.]]
[15]
Guozhu Dong, Jiawei Han, Laks V. S. Lakshmanan, Jian Pei, Haixun Wang, and Philip S. Yu. Online mining of changes from data streams: Research problems and preliminary results. In SIGMOD, 2003.]]
[16]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Michael Wimmer, and Xiaowei Xu. Incremental clustering for mining in a data warehousing environment. In Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pages 323--333, 1998.]]
[17]
Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Second International Conference on Knowledge Discovery and Data Mining, pages 226--231, 1996.]]
[18]
C. Jin et al. Dynamically maintaining frequent items over a data stream. In CIKM, 2003.]]
[19]
D. Abadi et al. Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2):120--139, 2003.]]
[20]
Sirish Chandrasekaran et al. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR, 2003.]]
[21]
Stream Mill Examples. Approximate Frequent Items http://wis.cs.ucla.edu/stream-mill/examples/freq.html.]]
[22]
E. Forgy. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics, page 768, 1965.]]
[23]
George Forman. Tackling concept drift by temporal inductive transfer. In SIGIR, pages 252--259, 2006.]]
[24]
J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A data mining query language for relational databases. In Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), pages 27--33, Montreal, Canada, June 1996.]]
[25]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000.]]
[26]
T. Imielinski and A. Virmani. MSQL: a query language for database mining. Data Mining and Knowledge Discovery, 3:373--408, 1999.]]
[27]
Tomasz Imielinski and Heikki Mannila. A database perspective on knowledge discovery. Commun. ACM, 39(11):58--64, 1996.]]
[28]
Nan Jiang and Le Gruenwald. Research issues in data stream association rule mining. SIGMOD Record, 35(1): 14--19, 2006.]]
[29]
Minsoo Kim, Jae-Hyun Seo, II-Ahn Cheong, and Bong-Nam Noh. Fuzzy Systems and Knowledge Discovery, chapter Auto-generation of Detection Rules with Tree Induction Algorithm, pages 160--169. Springer Berlin / Heidelberg, 2005.]]
[30]
Yan-Nei Law, Haixun Wang, and Carlo Zaniolo. Data models and query language for data streams. In VLDB, pages 492--503, 2004.]]
[31]
C. K.-S. Leung, Q. I. Khan, and T. Hoque. Cantree: A tree structure for efficient incremental mining of frequent patterns. In ICDM, 2005.]]
[32]
C. Luo, H. Thakkar, H. Wang, and C. Zaniolo. A native extension of sql for mining data streams. In SIGMOD, pages 873--875, 2005.]]
[33]
R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In VLDB, pages 122--133, Bombay, India, 1996.]]
[34]
Barzan Mozafari, Hetal Thakkar, and Carlo Zaniolo. Verifying and mining frequent patterns from large windows over data streams. In International Conference on Data Engineering (ICDE), 2008.]]
[35]
Chang-Shing Perng and D. S. Parker. SQL/LPP: A time series extension of SQL based on limited patience patterns. In DEXA, volume 1677 of Lecture Notes in Computer Science. Springer, 1999.]]
[36]
R. Ramakrishnan, D. Donjerkovic, A. Ranganathan, K. Beyer, and M. Krishnaprasad. Srql: Sorted relational query language, 1998.]]
[37]
Reza Sadri, Carlo Zaniolo, Amir Zarkesh, and Jafar Adibi. Optimization of sequence queries in database systems. In PODS, Santa Barbara, CA, May 2001.]]
[38]
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In SIGMOD, 1998.]]
[39]
Praveen Seshadri, Miron Livny, and Raghu Ramakrishnan. Sequence query processing. In Richard T. Snodgrass and Marianne Winslett, editors, SIGMOD, pages 430--141. ACM Press, 1994.]]
[40]
A. Siebes. Where is the mining in kdid? (invited talk). In Fourth Int. Workshop on Knowledge Discovery in Inductive Databases, 2005.]]
[41]
M. Sullivan. Tribeca: A stream database manager for network traffic analysis. In VLDB, 1996.]]
[42]
Z. Tang, J. Maclennan, and P. Kim. Building data mining solutions with OLE DB for DM and XML analysis. SIGMOD Record, 34(2):80--85, 2005.]]
[43]
H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In SIGKDD, 2003.]]
[44]
Carlo Zaniolo. Mining databases and data streamswith query languages and rules (invited talk). In Fourth Int. Workshop on Knowledge Discovery in Inductive Databases, 2005.]]
[45]
Fred Zemke, Andrew Witkowski, Mitch Cherniak, and Latha Colby. Pattern matching in sequences of rows. Technical report, Oracle and IBM, 2007.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSPS '08: Proceedings of the 2nd international workshop on Scalable stream processing system
March 2008
99 pages
ISBN:9781595939630
DOI:10.1145/1379272
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data stream mining language
  2. data stream mining system
  3. integration of online mining in a DSMS

Qualifiers

  • Research-article

Conference

EDBT '08

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media