skip to main content
research-article

From a stream of relational queries to distributed stream processing

Published: 01 September 2010 Publication History

Abstract

Applications from several domains are now being written to process live data originating from hardware and software-based streaming sources. Many of these applications have been written relying solely on database and data warehouse technologies, despite their lack of need for transactional support and ACID properties. In several extreme high-load cases, this approach does not scale to the processing speeds that these applications demand. In this paper we demonstrate an application acceleration approach whereby a regular ODBC-based application is converted into a true streaming application with minimal disruption from a software engineering standpoint. We showcase our approach on three real-world applications. We experimentally demonstrate the substantial performance improvements that can be observed when contrasting the accelerated implementation with the original database-oriented implementation.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2005.
[2]
L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. SPC: A distributed, scalable platform for data mining. In Proceedings of the Workshop on Data Mining Standards, Services and Platforms, DM-SSP, 2006.
[3]
A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26, 2003.
[4]
A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: semantic foundations and query execution. Journal on Very Large Data Bases, 15(2), 2006.
[5]
A. Arasu and J. Widom. A denotational semantics for continuous queries over streams and relations. SIGMOD Record, 33(3), 2004.
[6]
P. O. Boykin and V. P. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61--68, 2005.
[7]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2003.
[8]
C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. H. Saltz. Titan: A high-performance remote sensing database. In Proceedings of the International Conference on Data Engineering (ICDE 1997), pages 375--384, 1997.
[9]
S. Chaudhuri. An overview of query optimization in relational systems. In Proceedings of the Symposium on Principles of Database Systems (PODS 1998), New York, NY, USA, 1998. ACM.
[10]
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. In Proceedings of the Conference on Innovative Data Systems Research (CIDR 2003), 2003.
[11]
K. Dasgupta, R. Singh, B. Viswanathan, and A. Joshi. Social ties and their relevance to churn in mobile telecom networks. In Proceedings of the 2008 International Conference on Extending Database Technology (EDBT 2008), March 2008.
[12]
W. De Pauw and H. Andrade. Visualizing large-scale streaming applications. Information Visualization, 8(2), 2009.
[13]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[14]
D. Dewitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H.-I. Hsiao, and R. Rasmussen. The gamma database machine project. IEEE Transations on Knowledge and Data Engineering, 2(1):44--62, 1990.
[15]
D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Communications of the ACM, 35(6):85--98, 1992.
[16]
H. Garcia-Molina and K. Salem. Main memory database systems: An overview. IEEE Transations on Knowledge and Data Engineering, 4(6):509--516, 1992.
[17]
H. Garcia-Molina and G. Wiederhold. Read-only transactions in a distributed database. ACM Transactions Database Systems, 7(2):209--234, 1982.
[18]
B. Gedik, H. Andrade, A. Frenkiel, W. De Pauw, M. Pfeifer, P. Allen, N. Cohen, and K.-L. Wu. Debugging tools and strategies for distributed stream processing applications. Software: Practice and Experience, 39(16), 2009.
[19]
B. Gedik, H. Andrade, and K.-L. Wu. A code generation approach to optimizing high-performance distributed data stream processing. In Proceedings of the 2009 Conference on Information and Knowledge Management (CIKM 2009), 2009.
[20]
B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. SPADE: The System S declarative stream processing engine. In Proceedings of the ACM International Conference on Management of Data (SIGMOD 2008), 2008.
[21]
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2), 1993.
[22]
S. Graves. In-memory database systems. Linux Journal, 2002(101):10, 2002.
[23]
M. Hirzel, H. Andrade, B. Gedik, V. Kumar, G. Losa, M. Mendell, H. Nasgaard, R. Soulé, and K.-L. Wu. Spade -- language specification. Technical Report RC24987, IBM Research, 2009.
[24]
IBM. SolidDB. http://www-01.ibm.com/software/data/soliddb/.
[25]
Y. E. Ioannidis. Query optimization. ACM Computing Surveys, 28(1), 1996.
[26]
G. Jacques-Silva, B. Gedik, H. Andrade, and K.-L. Wu. Language level checkpointing support for stream processing applications. In Proocedings of the 2009 International Conference on Dependable Systems and Networks (DSN 2009), 2009.
[27]
N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In Proceedings of the International Conference on Management of Data (SIGMOD 2006), 2006.
[28]
V. Josifovski, P. Schwarz, L. Haas, and E. Lin. Garlic: a new flavor of federated query processing for DB2. In Proceedings of the International Conference on Management of Data (SIGMOD 2002), 2002.
[29]
T. Schank and D. Wagner. Finding, counting, and listing all triangles in large graphs, an experimental study. In Workshop on Experimental and Efficient Algorithms (WEA), 2005.
[30]
D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proceedings of the Symposium on Principles of Database Systems (PODS 2002), 2002.
[31]
R. Soulé, M. Hirzel, R. Grimm, B. Gedik, H. Andrade, V. Kumar, and K.-L. Wu. A unified semantics for stream processing languages. In Proceedings of the 19th European Symposium on Programming (ESOP 2010), 2010.
[32]
StreamBase Systems. http://www.streambase.com/.
[33]
Z. Su, J. Jiang, T. Liu, G. T. Xie, and Y. Pan. Market Intelligence Portal: an entity-based system for managing market intelligence. IBM Systems Journal, 43(3), 2004.
[34]
C. TimesTen Team. In-memory data management for consumer transactions the timesten approach. SIGMOD Record, 28(2):528--529, 1999.
[35]
D. S. Turaga, O. Verscheure, J. Wong, L. Amini, G. Yocum, E. Begle, and B. Pfeifer. Online FDC control limit tuning with yield prediction using incremental decision tree learning. In (Sematech AEC/APC 2007), 2007.
[36]
M. R. Vieira, P. Bakalov, and V. Tsotras. Querying trajectories using flexible patterns. In Proceedings of the Conference on Extending Database Technology (EDBT 2010), 2010.
[37]
L. Weitao. China's mobile phone users hit 650 million. http://www.chinadaily.com.cn/bizchina/2009-03/06/content_7547876.htm retrieved on January 27, 2010, 2009.
[38]
K.-L. Wu, P. S. Yu, B. Gedik, K. W. Hildrum, C. C. Aggarwal, E. Bouillet, W. Fan, D. A. George, X. Gu, G. Luo, and H. Wang. Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S. In Proceedings of the Very Large Data Bases Conference (VLDB 2007), 2007.
[39]
X. Zhang, H. Andrade, B. Gedik, R. King, J. Morar, S. Nathan, Y. Park, R. Pavuluri, E. Pring, R. Schnier, P. Selo, M. Spicer, and C. Venkatramani. Implementing a high-volume, low-latency market data processing system on commodity hardware using ibm middleware. In Proceedings of the 2009 Workshop on High Performance Computational Finance (WHPCF 2009), 2009.

Cited By

View all

Index Terms

  1. From a stream of relational queries to distributed stream processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
    September 2010
    1658 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2010
    Published in PVLDB Volume 3, Issue 1-2

    Author Tags

    1. ODBC
    2. continuous processing
    3. streaming processing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media