skip to main content
research-article

Dhalion: self-regulating stream processing in heron

Published: 01 August 2017 Publication History

Abstract

In recent years, there has been an explosion of large-scale real-time analytics needs and a plethora of streaming systems have been developed to support such applications. These systems are able to continue stream processing even when faced with hardware and software failures. However, these systems do not address some crucial challenges facing their operators: the manual, time-consuming and error-prone tasks of tuning various configuration knobs to achieve service level objectives (SLO) as well as the maintenance of SLOs in the face of sudden, unpredictable load variation and hardware or software performance degradation.
In this paper, we introduce the notion of self-regulating streaming systems and the key properties that they must satisfy. We then present the design and evaluation of Dhalion, a system that provides self-regulation capabilities to underlying streaming systems. We describe our implementation of the Dhalion framework on top of Twitter Heron, as well as a number of policies that automatically reconfigure Heron topologies to meet throughput SLOs, scaling resource consumption up and down as needed. We experimentally evaluate our Dhalion policies in a cloud environment and demonstrate their effectiveness. We are in the process of open-sourcing our Dhalion policies as part of the Heron project.

References

[1]
Apache Aurora. http://aurora.apache.org/.
[2]
Apache Flink. https://flink.apache.org/.
[3]
Apache Kafka. http://kafka.apache.org/.
[4]
Apache Samza. http://samza.apache.org/.
[5]
DataTorrent. https://www.datatorrent.com.
[6]
Distributed Log. http://distributedlog.incubator.apache.org/.
[7]
Heron Code Repository. https://github.com/twitter/heron.
[8]
Microsoft HDInsight. https://azure.microsoft.com/en-us/services/hdinsight/.
[9]
S4 Distributed Data Platform. http://incubator.apache.org/s4/.
[10]
Spark Streaming. http://spark.apache.org/streaming/.
[11]
D. J. Abadi et al. The Design of the Borealis Stream Processing Engine. In CIDR, pages 277--289, 2005.
[12]
T. Akidau et al. MillWheel: Fault-tolerant Stream Processing at Internet Scale. PVLDB, 6(11):1033--1044, Aug. 2013.
[13]
H. Balakrishnan et al. Retrospective on Aurora. The VLDB Journal, 13:2004, 2004.
[14]
S. Chaudhuri and V. Narasayya. Self-Tuning Database Systems: A Decade of Progress. In VLDB, September 2007.
[15]
J. Dean and L. A. Barroso. The Tail at Scale. Commun. ACM, 56(2):74--80, Feb. 2013.
[16]
T. Do et al. Limplock: Understanding the Impact of Limpware on Scale-out Cloud Systems. In SOCC '13, pages 14:1--14:14. ACM, 2013.
[17]
M. Fu et al. Twitter Heron: Towards Extensible Streaming Engines. In ICDE. IEEE, 2017.
[18]
T. Z. J. Fu et al. DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams. In ICDCS, pages 411--420, 2015.
[19]
B. Gedik et al. Elastic Scaling for Data Stream Processing. IEEE Trans. Parallel Distrib. Syst., 25(6):1447--1463, June 2014.
[20]
T. S. Group. STREAM: The Stanford Stream Data Manager. Technical Report 2003-21, Stanford InfoLab, 2003.
[21]
H. Herodotou et al. Starfish: A Self-tuning System for Big Data Analytics. In CIDR, pages 261--272, 2011.
[22]
S. Kulkarni et al. Twitter Heron: Stream Processing at Scale. In ACM SIGMOD '15, pages 239--250, 2015.
[23]
A. Pavlo et al. Self-Driving Database Management Systems. In CIDR, 2017.
[24]
T. N. Pham, P. K. Chrysanthis, and A. Labrinidis. Avoiding Class Warfare: Managing Continuous Queries with Differentiated Classes of Service. The VLDB Journal, 25(2):197--221, Apr. 2016.
[25]
J. Schad, J. Dittrich, and J.-A. Quiané-Ruiz. Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance. PVLDB, 3(1-2):460--471, Sept. 2010.
[26]
A. Toshniwal et al. Storm@Twitter. In 2014 ACM SIGMOD.
[27]
V. K. Vavilapalli et al. Apache Hadoop YARN: Yet Another Resource Negotiator. In SOCC, pages 5:1--5:16. ACM, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 10, Issue 12
August 2017
427 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2017
Published in PVLDB Volume 10, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media