skip to main content
10.1007/11733492_2guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Mining databases and data streams with query languages and rules

Published: 03 October 2005 Publication History

Abstract

Among data-intensive applications that are beyond the reach of traditional Data Base Management Systems (DBMS), data mining stands out because of practical importance and the complexity of the research problems that must be solved before the vision of Inductive DBMS can become a reality. In this paper, we first discuss technical developments that have occurred since the very notion of Inductive DBMS emerged as a result of the seminal papers authored by Imielinski and Mannila a decade ago. The research progress achieved since then can be subdivided into three main problem subareas as follows: (i) language (ii) optimization, and (iii) representation. We discuss the problems in these three areas and the different approaches to Inductive DBMS that are made possible by recent technical advances. Then, we pursue a language-centric solution, and introduce simple SQL extensions that have proven very effective at supporting data mining. Finally, we turn our attention to the related problem of supporting data stream mining using Data Stream Management Systems (DSMS) and introduce the notion of Inductive DSMS. In addition to continuous query languages, DSMS provide support for synopses, sampling, load shedding, and other built-in functions that are needed for data stream mining. Moreover, we show that Inductive DSMS can be achieved by generalizing DSMS to assure that their continuous query languages support efficiently data stream mining applications. Thus, DSMS extended with inductive capabilities will provide a uniquely supportive environment for data stream mining applications.

References

[1]
Tomasz Imielinski. A database perspective on knowledge discovery. In The First International Conference on Knowledge Discovery and Data Mining (KDD-95), 1995.
[2]
Tomasz Imielinski and Heikki Mannila. A database perspective on knowledge discovery. Communication ACM, 39(11):58-64, 1996.
[3]
T. Imielinski and A. Virmani. MSQL: a query language for database mining. Data Mining and Knowledge Discovery, 3:373-408, 1999.
[4]
J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A data mining query language for relational databases. In Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), pages 27-33, Montreal, Canada, June 1996.
[5]
R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In VLDB, pages 122-133, Bombay, India, 1996.
[6]
Marco Botta, Jean-François Boulicaut, Cyrille Masson, and Rosa Meo. Query languages supporting descriptive rule mining: A comparative study. In Database Support for Data Mining Applications, pages 24-51, 2004.
[7]
Francesco Bonchi, Fosca Giannotti, Alessio Mazzanti, and Dino Pedreschi. Examiner: Optimized level-wise frequent pattern mining with monotone constraint. In ICDM, pages 11-18, 2003.
[8]
Sau Dan Lee and Luc De Raedt. An algebra for inductive query evaluation. In ICDM, pages 147-154, 2003.
[9]
Francesco Bonchi and Claudio Lucchese. Pushing tougher constraints in frequent pattern mining. In PAKDD, pages 114-124, 2005.
[10]
Baptiste Jeudy and Jean-François Boulicaut. Constraint-based discovery and inductive queries: Application to association rule mining. In Pattern Detection and Discovery, pages 110-124, 2002.
[11]
IBM. Db2 intelligent miner, http://www-306.ibm.com/software/data/iminer.
[12]
ORACLE. Oracle data miner release 10gr2, http://www.oracle.com/technology/ products/bi/odm.
[13]
Z. Tang, J. Maclennan, and P.P. Kim. Building data mining solutions with ole db for dm and xml for analysis. SIGMOD Record, 34(2):80-85, 2005.
[14]
Data Mining Group (DMG). Predictive model markup language (pmml), http://sourceforge.net/projects/pmml.
[15]
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In SIGMOD, 1998.
[16]
Arno Siebes. Where is the mining in kdid? (invited talk). In Fourth Int. Workshop on Knowledge Discovery in Inductive Databases (KDID 2005), Porto, Prtugal, 2005.
[17]
Yan-Nei Law, Haixun Wang, and Carlo Zaniolo. Data models and query language for data streams. In VLDB, pages 492-503, 2004.
[18]
Haixun Wang and Carlo Zaniolo. Atlas: a native extension of sql for data minining. In Proceedings of Third SIAM Int. Conference on Data Mining, pages 130-141, 2003.
[19]
Weka 3--data mining with open source machine learning software in java http://www.cs.waikato.ac.nz.
[20]
Theodore Johnson, Laks V. S. Lakshmanan, and Raymond T. Ng. The 3w model and algebra for unified data mining. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, pages 21-32. Morgan Kaufmann, 2000.
[21]
B. Babcock, S. Babu, M. Datar, R. Motawani, and J. Widom. Models and issues in data stream systems. In PODS, 2002.
[22]
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In SIGKDD, pages 97-106, San Francisco, CA, 2001. ACM Press.
[23]
Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. Mining concept-drifting data streams using ensemble classifiers. In KDD, pages 226-235, 2003.
[24]
Fang Chu, Yizhou Wang, and Carlo Zaniolo. An adaptive learning approach for noisy data streams. In ICDM, pages 351-354, 2004.
[25]
Lukasz Golab and M. Tamer Ozsu. Issues in data stream management. ACM SIGMOD Record, 32(2):5-14, 2003.
[26]
Theodore Johnson, S. Muthukrishnan, and Irina Rozenbaum. Sampling algorithms in a stream operator. In SIGMOD Conference, pages 1-12, 2005.
[27]
D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2):120-139, 2003.
[28]
C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an sql interface. In SIGMOD, page 623. ACM Press, 2002.
[29]
A. Arasu, S. Babu, and J. Widom. Cql: A language for continuous queries over streams and relations. In DBPL, pages 1-19, 2003.
[30]
Mohamed Medhat Gaber, Arkady B. Zaslavsky, and Shonali Krishnaswamy. Mining data streams: a review. SIGMOD Record, 34(2):18-26, 2005.
[31]
Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and Liadan O'Callaghan. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3):515-528, 2003.
[32]
Hannu Toivonen. Sampling large databases for association rules. In T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, and Nandlal L. Sarda, editors, VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India, pages 134-145. Morgan Kaufmann, 1996.
[33]
Kagan Tumer and Joydeep Ghosh. Error correlation and error reduction in ensemble classifiers. Connect. Sci., 8(3):385-404, 1996.
[34]
Yan-Nei Law and Carlo Zaniolo. Improving the accuracy of continuous aggregates and mining queries. In Submitted for Publication, 2005.
[35]
Nesime Tatbul, Ugur Çetintemel, Stanley B. Zdonik, Mitch Cherniack, and Michael Stonebraker. Load shedding in a data stream manager. In VLDB, pages 309-320, 2003.
[36]
Yanif Ahmad, Bradley Berg, Ugur Çetintemel, Mark Humphrey, Jeong-Hyon Hwang, Anjali Jhingran, Anurag Maskey, Olga Papaemmanouil, Alex Rasin, Nesime Tatbul, Wenjuan Xing, Ying Xing, and Stanley B. Zdonik. Distributed operation in the borealis stream processing engine. In SIGMOD Conference, pages 882-884, 2005.
[37]
Stream mill home. http://wis.cs.ucla.edu/stream-mill.
[38]
Chang Luo, Hetal Thakkar, Haixun Wang, and Carlo Zaniolo. A native extension of sql for mining data streams. pages 873-875, 2005.
[39]
Hans-Peter Kriegel Martin Ester, J. Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 1996, pages 226-231, 1996.
[40]
Y. Bai, L. Chang, H. Thakkar, X. Zhou, and C. Zaniolo. Efficient support for time series queries in data stream management systems. In K. Shaw N. Chaudhry and M. Abdelguerfi (eds), editors, Stream Data Management" Kluwer: Chapter 6. Kluwer Academic Publishers, 2005.
[41]
Xin Zhou, Hetal Thakkar, and Carlo Zaniolo. Unifying the processing of xml streams and relational data streams. The 22nd International Conference on Data Engineering April 3-7, Atlanta, GA, 2006, 2005.
[42]
ZhaoHui Tang, Jamie Maclennan, and Pyungchul (Peter) Kim. Building data mining solutions with ole db for dm and xml for analysis. SIGMOD Record, 34(2):80- 85, 2005.
[43]
Clementine http://www.spss.com/clementine/index.htm.
[44]
F. Giannotti, G. Manco, D. Pedreschi, and F. Turini. Experiences with a logicbased knowledge discovery support environment. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), 1999.
[45]
Fosca Giannotti, Giuseppe Manco, Dino Pedreschi, and Franco Turini. Experiences with a logic-based knowledge discovery support environment. In AI*IA, pages 202- 213, 1999.
[46]
Faiz Arni, KayLiang Ong, Shalom Tsur, Haixun Wang, and Carlo Zaniolo. The deductive database system ldl++. TPLP, 3(1):61-94, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
KDID'05: Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
October 2005
250 pages
ISBN:3540332928
  • Editors:
  • Francesco Bonchi,
  • Jean-François Boulicaut

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 03 October 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media