skip to main content
10.1145/2666158.2666174acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Optimization of Data-intensive Flows: Is it Needed? Is it Solved?

Published: 07 November 2014 Publication History

Abstract

Modern data analysis is increasingly employing data-intensive flows for processing very large volumes of data. As the data flows become more and more complex and operate in a highly dynamic environment, we argue that we need to resort to automated cost-based optimization solutions rather than relying on efficient designs by human experts. We further demonstrate that the current state-of-the-art in flow optimizations needs to be extended and we propose a promising direction for optimizing flows at the logical level, and more specifically, for deciding the sequence of flow tasks.

References

[1]
D. Abadi et al. The beckman database research self-assessment meeting. Technical report, 2013.
[2]
S. Abrishami, M. Naghibzadeh, and D. H. Epema. Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Generation Computer Systems, 29(1):158 -- 169, 2013.
[3]
J. Burge, K. Munagala, and U. Srivastava. Ordering pipelined query operators with precedence constraints. Technical Report 2005-40, Stanford InfoLab, 2005.
[4]
S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Commun. ACM, 54:88--98, 2011.
[5]
R. Dewan, A. Seidmann, and Z. Walter. Workflow optimization through task redesign in business information processes. In HICSS, pages 240--252. IEEE Computer Society, 1998.
[6]
R. Halasipuram, P. M. Deshpande, and S. Padmanabhan. Determining essential statistics for cost based optimization of an etl workflow. In EDBT, pages 307--318, 2014.
[7]
S. Holl, O. Zimmermann, M. Palmblad, Y. Mohammed, and M. Hofmann-Apitius. A new optimization phase for scientific workflow management systems. Future Generation Comp. Syst., 36:352--362, 2014.
[8]
F. Hueske, M. Peters, M. Sax, A. Rheinlander, R. Bergmann, A. Krettek, and K. Tzoumas. Opening the black boxes in data flow optimization. PVLDB, 5(11):1256--1267, 2012.
[9]
G. Kougka and A. Gounaris. On optimizing work ows using query processing techniques. In SSDBM, pages 601--606, 2012.
[10]
G. Kougka and A. Gounaris. Declarative expression and optimization of data-intensive flows. In DaWaK, pages 13--25, 2013.
[11]
R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive queries. In VLDB, pages 128--137, 1986.
[12]
N. Kumar and P. S. Kumar. An efficient heuristic for logical optimization of etl workflows. In BIRTE, volume 84 of Lecture Notes in Business Information Processing, pages 68--83. Springer, 2010.
[13]
E. S. Ogasawara, D. de Oliveira, P. Valduriez, J. Dias, F. Porto, and M. Mattoso. An algebraic approach for data-centric scientific workflows. PVLDB, 4:1328--1339, 2011.
[14]
A. Simitsis, P. Vassiliadis, and T. K. Sellis. State-space optimization of etl workflows. IEEE Trans. Knowl. Data Eng., 17(10):1404--1419, 2005.
[15]
Y. L. Varol and D. Rotem. An algorithm to generate all topological sorting arrangements. The Computer Journal, 24(1):83--84, 1981.
[16]
M. Vrhovnik, H. Schwarz, O. Suhre, B. Mitschang, V. Markl, A. Maier, and T. Kraft. An approach to optimize data processing in business processes. In VLDB, pages 615--626, 2007.
[17]
Z. Xiao, H. Chang, and Y. Yi. Optimization of workflow resources allocation with cost constraint. In Proc. of the 10th Int. Conf. on Computer supported cooperative work in design, pages 647--656, 2007.

Cited By

View all

Index Terms

  1. Optimization of Data-intensive Flows: Is it Needed? Is it Solved?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DOLAP '14: Proceedings of the 17th International Workshop on Data Warehousing and OLAP
    November 2014
    110 pages
    ISBN:9781450309998
    DOI:10.1145/2666158
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data flow optimization
    2. task reordering

    Qualifiers

    • Research-article

    Conference

    CIKM '14
    Sponsor:

    Acceptance Rates

    DOLAP '14 Paper Acceptance Rate 8 of 22 submissions, 36%;
    Overall Acceptance Rate 29 of 79 submissions, 37%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media