A growing number of applications, including network traffic monitoring and highway congestion analysis, continuously generate massive data streams. Management of these streams presents many new research challenges, which include Quality of Service (QoS) guarantees, window and other synopses. Therefore, many research projects have focused on building Data Stream Management Systems (DSMSs) to address these challenges [ACC03, ABW03, CCD03]. However, all of these systems are limited to simple continuous queries over data streams, i.e., they do not support advanced applications, such as data stream mining. However, such advanced applications are critical in many real-world scenarios, such as web click-stream analysis, market basket data mining, and credit card fraud detection. The importance of data stream mining is further illustrated by research projects focusing on devising fast & light algorithms for online mining [CWY04, JQS03, CZ04, WFY03, EKS98, FOR06, MTZ08]. However, besides devising fast & light algorithms deployment of online data stream mining methods presents many difficult challenges. In particular data stream mining methods must be deployed with all essentials that DSMSs provide for simpler applications, including QoS, load shedding, and synopses. Thus, in this dissertation we extend a DSMS into an online data mining workbench by the following research advances: (1) The power of our DSMS, namely Stream Mill, and its language were extended to support more advanced queries, such as online mining, sequence queries, etc., by extending the query language (namely SQL), (2) A suite of online mining algorithms are integrated into the DSMS, to provide advanced mining techniques, such as ensemble-based methods [WFY03, CZ04, FORO6]), and (3) Data mining models and workflows are introduced to support specification of the complete mining process. This stimulates ease-of-use, since all users can now simply invoke the workflow, as opposed to recreating the flow by himself/herself. The framework also allows experts to add new mining algorithms. We demonstrate that the resulting data stream mining workbench achieves performance and extensibility, which are unmatched, even by static mining workbenches.
Recommendations
Real-Time Scheduling for Data Stream Management Systems
ECRTS '05: Proceedings of the 17th Euromicro Conference on Real-Time SystemsQuality-aware management of data streams is gaining moreand more importance with the amount of data produced by streams growing continuously. The resources required for data stream processing depend on different factors and are limited by the environment ...
Data Stream Mining: Challenges and Techniques
ICTAI '10: Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence - Volume 02Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, call center records and so on. Their sheer volume and speed pose a great challenge for the data mining community to mine them. Data streams ...