Article

Mining databases and data streams with query languages and rules

Author:

Carlo ZanioloAuthors Info & Claims

KDID'05: Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases

Pages 24 - 37

https://doi.org/10.1007/11733492_2

Published: 03 October 2005 Publication History

Abstract

Among data-intensive applications that are beyond the reach of traditional Data Base Management Systems (DBMS), data mining stands out because of practical importance and the complexity of the research problems that must be solved before the vision of Inductive DBMS can become a reality. In this paper, we first discuss technical developments that have occurred since the very notion of Inductive DBMS emerged as a result of the seminal papers authored by Imielinski and Mannila a decade ago. The research progress achieved since then can be subdivided into three main problem subareas as follows: (i) language (ii) optimization, and (iii) representation. We discuss the problems in these three areas and the different approaches to Inductive DBMS that are made possible by recent technical advances. Then, we pursue a language-centric solution, and introduce simple SQL extensions that have proven very effective at supporting data mining. Finally, we turn our attention to the related problem of supporting data stream mining using Data Stream Management Systems (DSMS) and introduce the notion of Inductive DSMS. In addition to continuous query languages, DSMS provide support for synopses, sampling, load shedding, and other built-in functions that are needed for data stream mining. Moreover, we show that Inductive DSMS can be achieved by generalizing DSMS to assure that their continuous query languages support efficiently data stream mining applications. Thus, DSMS extended with inductive capabilities will provide a uniquely supportive environment for data stream mining applications.

References

[1]

Tomasz Imielinski. A database perspective on knowledge discovery. In The First International Conference on Knowledge Discovery and Data Mining (KDD-95), 1995.

[2]

Tomasz Imielinski and Heikki Mannila. A database perspective on knowledge discovery. Communication ACM, 39(11):58-64, 1996.

Digital Library

[3]

T. Imielinski and A. Virmani. MSQL: a query language for database mining. Data Mining and Knowledge Discovery, 3:373-408, 1999.

Digital Library

[4]

J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A data mining query language for relational databases. In Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), pages 27-33, Montreal, Canada, June 1996.

[5]

R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In VLDB, pages 122-133, Bombay, India, 1996.

Digital Library

[6]

Marco Botta, Jean-François Boulicaut, Cyrille Masson, and Rosa Meo. Query languages supporting descriptive rule mining: A comparative study. In Database Support for Data Mining Applications, pages 24-51, 2004.

[7]

Francesco Bonchi, Fosca Giannotti, Alessio Mazzanti, and Dino Pedreschi. Examiner: Optimized level-wise frequent pattern mining with monotone constraint. In ICDM, pages 11-18, 2003.

Digital Library

[8]

Sau Dan Lee and Luc De Raedt. An algebra for inductive query evaluation. In ICDM, pages 147-154, 2003.

Digital Library

[9]

Francesco Bonchi and Claudio Lucchese. Pushing tougher constraints in frequent pattern mining. In PAKDD, pages 114-124, 2005.

Digital Library

[10]

Baptiste Jeudy and Jean-François Boulicaut. Constraint-based discovery and inductive queries: Application to association rule mining. In Pattern Detection and Discovery, pages 110-124, 2002.

Digital Library

[11]

IBM. Db2 intelligent miner, http://www-306.ibm.com/software/data/iminer.

[12]

ORACLE. Oracle data miner release 10gr2, http://www.oracle.com/technology/ products/bi/odm.

[13]

Z. Tang, J. Maclennan, and P.P. Kim. Building data mining solutions with ole db for dm and xml for analysis. SIGMOD Record, 34(2):80-85, 2005.

Digital Library

[14]

Data Mining Group (DMG). Predictive model markup language (pmml), http://sourceforge.net/projects/pmml.

[15]

S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In SIGMOD, 1998.

Digital Library

[16]

Arno Siebes. Where is the mining in kdid? (invited talk). In Fourth Int. Workshop on Knowledge Discovery in Inductive Databases (KDID 2005), Porto, Prtugal, 2005.

[17]

Yan-Nei Law, Haixun Wang, and Carlo Zaniolo. Data models and query language for data streams. In VLDB, pages 492-503, 2004.

Digital Library

[18]

Haixun Wang and Carlo Zaniolo. Atlas: a native extension of sql for data minining. In Proceedings of Third SIAM Int. Conference on Data Mining, pages 130-141, 2003.

[19]

Weka 3--data mining with open source machine learning software in java http://www.cs.waikato.ac.nz.

[20]

Theodore Johnson, Laks V. S. Lakshmanan, and Raymond T. Ng. The 3w model and algebra for unified data mining. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, pages 21-32. Morgan Kaufmann, 2000.

Digital Library

[21]

B. Babcock, S. Babu, M. Datar, R. Motawani, and J. Widom. Models and issues in data stream systems. In PODS, 2002.

Digital Library

[22]

G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In SIGKDD, pages 97-106, San Francisco, CA, 2001. ACM Press.

Digital Library

[23]

Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. Mining concept-drifting data streams using ensemble classifiers. In KDD, pages 226-235, 2003.

Digital Library

[24]

Fang Chu, Yizhou Wang, and Carlo Zaniolo. An adaptive learning approach for noisy data streams. In ICDM, pages 351-354, 2004.

Digital Library

[25]

Lukasz Golab and M. Tamer Ozsu. Issues in data stream management. ACM SIGMOD Record, 32(2):5-14, 2003.

Digital Library

[26]

Theodore Johnson, S. Muthukrishnan, and Irina Rozenbaum. Sampling algorithms in a stream operator. In SIGMOD Conference, pages 1-12, 2005.

Digital Library

[27]

D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2):120-139, 2003.

Digital Library

[28]

C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an sql interface. In SIGMOD, page 623. ACM Press, 2002.

Digital Library

[29]

A. Arasu, S. Babu, and J. Widom. Cql: A language for continuous queries over streams and relations. In DBPL, pages 1-19, 2003.

[30]

Mohamed Medhat Gaber, Arkady B. Zaslavsky, and Shonali Krishnaswamy. Mining data streams: a review. SIGMOD Record, 34(2):18-26, 2005.

Digital Library

[31]

Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and Liadan O'Callaghan. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3):515-528, 2003.

Digital Library

[32]

Hannu Toivonen. Sampling large databases for association rules. In T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, and Nandlal L. Sarda, editors, VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India, pages 134-145. Morgan Kaufmann, 1996.

Digital Library

[33]

Kagan Tumer and Joydeep Ghosh. Error correlation and error reduction in ensemble classifiers. Connect. Sci., 8(3):385-404, 1996.

[34]

Yan-Nei Law and Carlo Zaniolo. Improving the accuracy of continuous aggregates and mining queries. In Submitted for Publication, 2005.

[35]

Nesime Tatbul, Ugur Çetintemel, Stanley B. Zdonik, Mitch Cherniack, and Michael Stonebraker. Load shedding in a data stream manager. In VLDB, pages 309-320, 2003.

Digital Library

[36]

Yanif Ahmad, Bradley Berg, Ugur Çetintemel, Mark Humphrey, Jeong-Hyon Hwang, Anjali Jhingran, Anurag Maskey, Olga Papaemmanouil, Alex Rasin, Nesime Tatbul, Wenjuan Xing, Ying Xing, and Stanley B. Zdonik. Distributed operation in the borealis stream processing engine. In SIGMOD Conference, pages 882-884, 2005.

Digital Library

[37]

Stream mill home. http://wis.cs.ucla.edu/stream-mill.

[38]

Chang Luo, Hetal Thakkar, Haixun Wang, and Carlo Zaniolo. A native extension of sql for mining data streams. pages 873-875, 2005.

Digital Library

[39]

Hans-Peter Kriegel Martin Ester, J. Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 1996, pages 226-231, 1996.

[40]

Y. Bai, L. Chang, H. Thakkar, X. Zhou, and C. Zaniolo. Efficient support for time series queries in data stream management systems. In K. Shaw N. Chaudhry and M. Abdelguerfi (eds), editors, Stream Data Management" Kluwer: Chapter 6. Kluwer Academic Publishers, 2005.

[41]

Xin Zhou, Hetal Thakkar, and Carlo Zaniolo. Unifying the processing of xml streams and relational data streams. The 22nd International Conference on Data Engineering April 3-7, Atlanta, GA, 2006, 2005.

Digital Library

[42]

ZhaoHui Tang, Jamie Maclennan, and Pyungchul (Peter) Kim. Building data mining solutions with ole db for dm and xml for analysis. SIGMOD Record, 34(2):80- 85, 2005.

Digital Library

[43]

Clementine http://www.spss.com/clementine/index.htm.

[44]

F. Giannotti, G. Manco, D. Pedreschi, and F. Turini. Experiences with a logicbased knowledge discovery support environment. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), 1999.

[45]

Fosca Giannotti, Giuseppe Manco, Dino Pedreschi, and Franco Turini. Experiences with a logic-based knowledge discovery support environment. In AI*IA, pages 202- 213, 1999.

Digital Library

[46]

Faiz Arni, KayLiang Ong, Shalom Tsur, Haixun Wang, and Carlo Zaniolo. The deductive database system ldl++. TPLP, 3(1):61-94, 2003.

Digital Library

Cited By

Thakkar HMozafari BZaniolo CLee B(2008)Designing an inductive data stream management systemProceedings of the 2nd international workshop on Scalable stream processing system10.1145/1379272.1379286(79-88)Online publication date: 29-Mar-2008
https://dl.acm.org/doi/10.1145/1379272.1379286
Liu HYu JZeleznikow JGuan Y(2007)A Logic-Based Approach to Mining Inductive DatabasesProceedings of the 7th international conference on Computational Science, Part I: ICCS 200710.1007/978-3-540-72584-8_35(270-277)Online publication date: 27-May-2007
https://dl.acm.org/doi/10.1007/978-3-540-72584-8_35
Calders TLakshmanan LNg RParedaens J(2006)Expressive power of an algebra for data miningACM Transactions on Database Systems10.1145/1189769.118977031:4(1169-1214)Online publication date: 1-Dec-2006
https://dl.acm.org/doi/10.1145/1189769.1189770

Recommendations

Mining Recent Frequent Itemsets in Data Streams
FSKD '08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04

Mining frequent itemsets in data streams is a hot research topic in recent years. Due to the continuous, high-speed and unbounded properties of data streams, traditional algorithms on static dataset are not suitable for mining in data streams. In this ...
Mining of Frequent Itemsets from Streams of Uncertain Data
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering

Frequent itemset mining plays an essential role in the mining of various patterns and is in demand in many real-life applications. Hence, mining of frequent itemsets has been the subject of numerous studies since its introduction. Generally, most of ...
Efficient Mining of Weighted Frequent Patterns over Data Streams
HPCC '09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications

By considering different weights of the items, weighted frequent pattern (WFP)mining can discover more important knowledge compared to traditional frequent pattern mining. Therefore, WFP mining becomes an important research issue in data mining and ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

KDID'05: Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases

October 2005

250 pages

ISBN:3540332928

Editors:
Francesco Bonchi
Pisa KDD Laboratory, ISTI - C.N.R, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
,
Jean-François Boulicaut
INSA-Lyon, LIRIS CNRS UMR5205, Via Giuseppe Moruzzi 1, Villeurbanne, France

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 03 October 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Thakkar HMozafari BZaniolo CLee B(2008)Designing an inductive data stream management systemProceedings of the 2nd international workshop on Scalable stream processing system10.1145/1379272.1379286(79-88)Online publication date: 29-Mar-2008
https://dl.acm.org/doi/10.1145/1379272.1379286
Liu HYu JZeleznikow JGuan Y(2007)A Logic-Based Approach to Mining Inductive DatabasesProceedings of the 7th international conference on Computational Science, Part I: ICCS 200710.1007/978-3-540-72584-8_35(270-277)Online publication date: 27-May-2007
https://dl.acm.org/doi/10.1007/978-3-540-72584-8_35
Calders TLakshmanan LNg RParedaens J(2006)Expressive power of an algebra for data miningACM Transactions on Database Systems10.1145/1189769.118977031:4(1169-1214)Online publication date: 1-Dec-2006
https://dl.acm.org/doi/10.1145/1189769.1189770

View Options

View options

Media

Figures

Other

Tables

View Table of Contents