research-article

From a stream of relational queries to distributed stream processing

Editors: Elisa Bertino, Paolo Atzeni, Kian Lee Tan, Yi Chen, Y. C. Tay Authors:

Henrique Andrade,

Kun-Lung WuAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 3, Issue 1-2

Pages 1394 - 1405

https://doi.org/10.14778/1920841.1921012

Published: 01 September 2010 Publication History

Abstract

Applications from several domains are now being written to process live data originating from hardware and software-based streaming sources. Many of these applications have been written relying solely on database and data warehouse technologies, despite their lack of need for transactional support and ACID properties. In several extreme high-load cases, this approach does not scale to the processing speeds that these applications demand. In this paper we demonstrate an application acceleration approach whereby a regular ODBC-based application is converted into a true streaming application with minimal disruption from a software engineering standpoint. We showcase our approach on three real-world applications. We experimentally demonstrate the substantial performance improvements that can be observed when contrasting the accelerated implementation with the original database-oriented implementation.

References

[1]

D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2005.

[2]

L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. SPC: A distributed, scalable platform for data mining. In Proceedings of the Workshop on Data Mining Standards, Services and Platforms, DM-SSP, 2006.

Digital Library

[3]

A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26, 2003.

[4]

A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: semantic foundations and query execution. Journal on Very Large Data Bases, 15(2), 2006.

Digital Library

[5]

A. Arasu and J. Widom. A denotational semantics for continuous queries over streams and relations. SIGMOD Record, 33(3), 2004.

Digital Library

[6]

P. O. Boykin and V. P. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61--68, 2005.

Digital Library

[7]

S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2003.

[8]

C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. H. Saltz. Titan: A high-performance remote sensing database. In Proceedings of the International Conference on Data Engineering (ICDE 1997), pages 375--384, 1997.

Digital Library

[9]

S. Chaudhuri. An overview of query optimization in relational systems. In Proceedings of the Symposium on Principles of Database Systems (PODS 1998), New York, NY, USA, 1998. ACM.

Digital Library

[10]

M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. In Proceedings of the Conference on Innovative Data Systems Research (CIDR 2003), 2003.

[11]

K. Dasgupta, R. Singh, B. Viswanathan, and A. Joshi. Social ties and their relevance to churn in mobile telecom networks. In Proceedings of the 2008 International Conference on Extending Database Technology (EDBT 2008), March 2008.

Digital Library

[12]

W. De Pauw and H. Andrade. Visualizing large-scale streaming applications. Information Visualization, 8(2), 2009.

Digital Library

[13]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.

Digital Library

[14]

D. Dewitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H.-I. Hsiao, and R. Rasmussen. The gamma database machine project. IEEE Transations on Knowledge and Data Engineering, 2(1):44--62, 1990.

Digital Library

[15]

D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Communications of the ACM, 35(6):85--98, 1992.

Digital Library

[16]

H. Garcia-Molina and K. Salem. Main memory database systems: An overview. IEEE Transations on Knowledge and Data Engineering, 4(6):509--516, 1992.

Digital Library

[17]

H. Garcia-Molina and G. Wiederhold. Read-only transactions in a distributed database. ACM Transactions Database Systems, 7(2):209--234, 1982.

Digital Library

[18]

B. Gedik, H. Andrade, A. Frenkiel, W. De Pauw, M. Pfeifer, P. Allen, N. Cohen, and K.-L. Wu. Debugging tools and strategies for distributed stream processing applications. Software: Practice and Experience, 39(16), 2009.

Digital Library

[19]

B. Gedik, H. Andrade, and K.-L. Wu. A code generation approach to optimizing high-performance distributed data stream processing. In Proceedings of the 2009 Conference on Information and Knowledge Management (CIKM 2009), 2009.

Digital Library

[20]

B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. SPADE: The System S declarative stream processing engine. In Proceedings of the ACM International Conference on Management of Data (SIGMOD 2008), 2008.

Digital Library

[21]

G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2), 1993.

Digital Library

[22]

S. Graves. In-memory database systems. Linux Journal, 2002(101):10, 2002.

Digital Library

[23]

M. Hirzel, H. Andrade, B. Gedik, V. Kumar, G. Losa, M. Mendell, H. Nasgaard, R. Soulé, and K.-L. Wu. Spade -- language specification. Technical Report RC24987, IBM Research, 2009.

[24]

IBM. SolidDB. http://www-01.ibm.com/software/data/soliddb/.

[25]

Y. E. Ioannidis. Query optimization. ACM Computing Surveys, 28(1), 1996.

Digital Library

[26]

G. Jacques-Silva, B. Gedik, H. Andrade, and K.-L. Wu. Language level checkpointing support for stream processing applications. In Proocedings of the 2009 International Conference on Dependable Systems and Networks (DSN 2009), 2009.

[27]

N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In Proceedings of the International Conference on Management of Data (SIGMOD 2006), 2006.

Digital Library

[28]

V. Josifovski, P. Schwarz, L. Haas, and E. Lin. Garlic: a new flavor of federated query processing for DB2. In Proceedings of the International Conference on Management of Data (SIGMOD 2002), 2002.

Digital Library

[29]

T. Schank and D. Wagner. Finding, counting, and listing all triangles in large graphs, an experimental study. In Workshop on Experimental and Efficient Algorithms (WEA), 2005.

Digital Library

[30]

D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proceedings of the Symposium on Principles of Database Systems (PODS 2002), 2002.

Digital Library

[31]

R. Soulé, M. Hirzel, R. Grimm, B. Gedik, H. Andrade, V. Kumar, and K.-L. Wu. A unified semantics for stream processing languages. In Proceedings of the 19th European Symposium on Programming (ESOP 2010), 2010.

[32]

StreamBase Systems. http://www.streambase.com/.

[33]

Z. Su, J. Jiang, T. Liu, G. T. Xie, and Y. Pan. Market Intelligence Portal: an entity-based system for managing market intelligence. IBM Systems Journal, 43(3), 2004.

Digital Library

[34]

C. TimesTen Team. In-memory data management for consumer transactions the timesten approach. SIGMOD Record, 28(2):528--529, 1999.

Digital Library

[35]

D. S. Turaga, O. Verscheure, J. Wong, L. Amini, G. Yocum, E. Begle, and B. Pfeifer. Online FDC control limit tuning with yield prediction using incremental decision tree learning. In (Sematech AEC/APC 2007), 2007.

[36]

M. R. Vieira, P. Bakalov, and V. Tsotras. Querying trajectories using flexible patterns. In Proceedings of the Conference on Extending Database Technology (EDBT 2010), 2010.

Digital Library

[37]

L. Weitao. China's mobile phone users hit 650 million. http://www.chinadaily.com.cn/bizchina/2009-03/06/content_7547876.htm retrieved on January 27, 2010, 2009.

[38]

K.-L. Wu, P. S. Yu, B. Gedik, K. W. Hildrum, C. C. Aggarwal, E. Bouillet, W. Fan, D. A. George, X. Gu, G. Luo, and H. Wang. Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S. In Proceedings of the Very Large Data Bases Conference (VLDB 2007), 2007.

Digital Library

[39]

X. Zhang, H. Andrade, B. Gedik, R. King, J. Morar, S. Nathan, Y. Park, R. Pavuluri, E. Pring, R. Schnier, P. Selo, M. Spicer, and C. Venkatramani. Implementing a high-volume, low-latency market data processing system on commodity hardware using ibm middleware. In Proceedings of the 2009 Workshop on High Performance Computational Finance (WHPCF 2009), 2009.

Digital Library

Cited By

Bordin MGriebler DMencagli GGeyer CFernandes L(2020)DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing SystemsIEEE Access10.1109/ACCESS.2020.30439488(222900-222917)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3043948
Hirzel M(2019)Continuous QueriesEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_305(513-518)Online publication date: 20-Feb-2019
https://doi.org/10.1007/978-3-319-77525-8_305
HoseinyFarahabady MBastani STaheri JZomaya ATari ZKhan SWatson LSosonkina MThacker WWeinbub J(2018)Toward designing a dynamic CPU cap manager for timely dataflow platformsProceedings of the High Performance Computing Symposium10.5555/3213069.3213075(1-11)Online publication date: 15-Apr-2018
https://dl.acm.org/doi/10.5555/3213069.3213075
Show More Cited By

Index Terms

From a stream of relational queries to distributed stream processing
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs

Recommendations

Dual-Paradigm Stream Processing
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

Existing stream processing frameworks operate either under data stream paradigm processing data record by record to favor low latency, or under operation stream paradigm processing data in micro-batches to desire high throughput. For complex and mutable ...
A Video Manager for Relational Stream Processing Systems
3PGCIC '11: Proceedings of the 2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

Video streams are increasing by the development of network environments and price reduction of camera devices. Therefore needs of applications for querying video streams such as surveillance systems are on the increase. Meanwhile, relational stream ...
Bounding substreams in distributed stream processing
Abstract
A common problem in distributed stream processing is to split a stream into finite chunks of messages (substreams) and to determine their boundaries: stateful streaming operators should clear outdated state; time window operators ...
Highlights
- Punctuations can be inefficient for substreams bounding due to high network overhead.

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 3, Issue 1-2

September 2010

1658 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2010

Published in PVLDB Volume 3, Issue 1-2

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
280
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bordin MGriebler DMencagli GGeyer CFernandes L(2020)DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing SystemsIEEE Access10.1109/ACCESS.2020.30439488(222900-222917)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3043948
Hirzel M(2019)Continuous QueriesEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_305(513-518)Online publication date: 20-Feb-2019
https://doi.org/10.1007/978-3-319-77525-8_305
HoseinyFarahabady MBastani STaheri JZomaya ATari ZKhan SWatson LSosonkina MThacker WWeinbub J(2018)Toward designing a dynamic CPU cap manager for timely dataflow platformsProceedings of the High Performance Computing Symposium10.5555/3213069.3213075(1-11)Online publication date: 15-Apr-2018
https://dl.acm.org/doi/10.5555/3213069.3213075
Hirzel MBaudart GBonifati ADella Valle ESakr SAkrivi Vlachou A(2018)Stream Processing Languages in the Big Data EraACM SIGMOD Record10.1145/3299887.329989247:2(29-40)Online publication date: 11-Dec-2018
https://dl.acm.org/doi/10.1145/3299887.3299892
Hoque SMiranskyy A(2018)Online and Offline Analysis of Streaming Data2018 IEEE International Conference on Software Architecture Companion (ICSA-C)10.1109/ICSA-C.2018.00026(68-71)Online publication date: Apr-2018
https://doi.org/10.1109/ICSA-C.2018.00026
Hirzel M(2018)Continuous QueriesEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_305-1(1-6)Online publication date: 26-May-2018
https://doi.org/10.1007/978-3-319-63962-8_305-1
Trofimov A(2018)Consistency Maintenance in Distributed Analytical Stream ProcessingNew Trends in Databases and Information Systems10.1007/978-3-030-00063-9_38(413-422)Online publication date: 31-Aug-2018
https://doi.org/10.1007/978-3-030-00063-9_38
Vieira M(2016)Complex Motion Pattern Queries in Spatio-Temporal DatabasesGeospatial Research10.4018/978-1-4666-9845-1.ch011(269-292)Online publication date: 2016
https://doi.org/10.4018/978-1-4666-9845-1.ch011
Vieira M(2016)Complex Motion Pattern Queries in Spatio-Temporal DatabasesHandbook of Research on Innovative Database Query Processing Techniques10.4018/978-1-4666-8767-7.ch009(250-274)Online publication date: 2016
https://doi.org/10.4018/978-1-4666-8767-7.ch009
Soulé RHirzel MGedik BGrimm R(2016)RiverSoftware10.1002/spe.233846:7(891-929)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1002/spe.2338
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents