skip to main content
10.1145/3609437.3609443acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

EFTuner: A Bi-Objective Configuration Parameter Auto-Tuning Method Towards Energy-Efficient Big Data Processing

Published: 05 October 2023 Publication History

Abstract

Energy-efficiency now severely restricts the sustainable operation and development of big data services. In this paper, we propose a bi-objective configuration parameters auto-tuning method EFTuner towards energy-efficient big data processing. Following the sampling-modeling-searching workflow, EFTuner first leverages the Latin Hypercube Sampling to collect configuration sample data under multiple dataset input sizes and then separately build a datasize-aware prediction model for performance and energy consumption with Stochastic Gradient Boosted Regression Tree. Besides, to avoid meaningless variation in the evolutionary process of original NSGA-II, EFTuner also explores the Pareto-optimal configurations with a novel parameter importance-based mutation operation. Experiments conducted on a local 3-node Spark cluster with three different applications verify the advantages of EFTuner over the baselines.

References

[1]
Zhendong Bei, Nam Sung Kim, Kai HWang, and Zhibin Yu. 2021. OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS. IEEE Trans. Comput. 71, 4 (2021), 809–823.
[2]
Len Brown. 2023. Turbostat. https://www.mankier.com/8/turbostat. Accessed on Feb 20, 2023.
[3]
Xiaojun Cai, Feng Li, Ping Li, Lei Ju, and Zhiping Jia. 2017. SLA-aware energy-efficient scheduling scheme for Hadoop YARN. The Journal of Supercomputing 73 (2017), 3526–3546.
[4]
Stefano Cereda, Stefano Valladares, Paolo Cremonesi, and Stefano Doni. 2021. Cgptuner: a contextual gaussian process bandit approach for the automatic tuning of it configurations under varying workload conditions. Proceedings of the VLDB Endowment 14, 8 (2021), 1401–1413.
[5]
Hanhua Chen, Hai Jin, and Shaoliang Wu. 2016. Minimizing Inter-Server Communications by Exploiting Self-Similarity in Online Social Networks. IEEE Trans. Parallel Distributed Syst. 27, 4 (2016), 1116–1130.
[6]
Yanpei Chen, Archana Ganapathi, and Randy H Katz. 2010. To compress or not to compress-compute vs. io tradeoffs for mapreduce energy efficiency. In Proceedings of the first ACM SIGCOMM workshop on Green networking. 23–28.
[7]
Ayat Fekry, Lucian Carata, Thomas Pasquier, and Andrew Rice. 2020. Accelerating the configuration tuning of big data analytics with similarity-aware multitask bayesian optimization. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 266–275.
[8]
Ayat Fekry, Lucian Carata, Thomas Pasquier, Andrew Rice, and Andy Hopper. 2020. To tune or not to tune? in search of optimal configurations for data analytics. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2494–2504.
[9]
Vimuth Fernando and Sanath Jayasena. 2017. Autotuning multi-tiered applications for performance. In 2017 IEEE International Conference on Industrial and Information Systems (ICIIS). IEEE, 1–6.
[10]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.
[11]
Yijin Guo, Huasong Shan, Shixin Huang, Kai Hwang, Jianping Fan, and Zhibin Yu. 2021. GML: Efficiently Auto-Tuning Flink’s Configurations Via Guided Machine Learning. IEEE Transactions on Parallel and Distributed Systems 32, 12 (2021), 2921–2935.
[12]
Álvaro Brandón Hernández, María S Perez, Smrati Gupta, and Victor Muntés-Mulero. 2018. Using machine learning to optimize parallelism in big data applications. Future Generation Computer Systems 86 (2018), 1076–1092.
[13]
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In 2010 IEEE 26th International conference on data engineering workshops (ICDEW 2010). IEEE, 41–51.
[14]
Shadi Ibrahim, Tien-Dat Phan, Alexandra Carpen-Amarie, Houssem-Eddine Chihoub, Diana Moise, and Gabriel Antoniu. 2016. Governing energy consumption in Hadoop through CPU frequency scaling: An analysis. Future Generation Computer Systems 54 (2016), 219–232.
[15]
Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman. 2020. Too many knobs to tune? towards faster database tuning by pre-selecting important knobs. In Workshop on Hot Topics in Storage and File Systems.
[16]
Rini T Kaushik and Milind Bhandarkar. 2010. Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In Proceedings of the USENIX annual technical conference, Vol. 109. 34.
[17]
Md Muhib Khan and Weikuan Yu. 2021. Robotune: high-dimensional configuration tuning for cluster-based data analytics. In 50th International Conference on Parallel Processing. 1–10.
[18]
Hongjian Li, Yaojun Wei, Yu Xiong, Enjie Ma, and Wenhong Tian. 2021. A frequency-aware and energy-saving strategy based on DVFS for Spark. The Journal of Supercomputing 77 (2021), 11575–11596.
[19]
Michael D McKay, Richard J Beckman, and William J Conover. 2000. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42, 1 (2000), 55–61.
[20]
Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193–206.
[21]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.
[22]
Houssem Sagaama, Nourchene Ben Slimane, Maher Marwani, and Sabri Skhiri. 2021. Automatic parameter tuning for big data pipelines with deep reinforcement learning. In 2021 IEEE Symposium on Computers and Communications (ISCC). IEEE, 1–7.
[23]
Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. 2019. ibtune: Individualized buffer tuning for large-scale cloud databases. Proceedings of the VLDB Endowment 12, 10 (2019), 1221–1234.
[24]
Wen Xiong, Zhengdong Bei, Chengzhong Xu, and Zhibin Yu. 2017. ATH: auto-tuning HBase’s configuration via ensemble learning. Ieee Access 5 (2017), 13157–13170.
[25]
Zhibin Yu, Zhendong Bei, and Xuehai Qian. 2018. Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. 564–577.
[26]
Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 415–432.
[27]
Ji Zhang, Ke Zhou, Guoliang Li, Yu Liu, Ming Xie, Bin Cheng, and Jiashu Xing. 2021. CDBTune+: An efficient deep reinforcement learning-based automatic cloud database tuning system. The VLDB Journal 30, 6 (2021), 959–987.
[28]
Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta-learning for cloud databases. In Proceedings of the 2021 international conference on management of data. 2102–2114.
[29]
Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In Proceedings of the 2017 Symposium on Cloud Computing. 338–350.

Index Terms

  1. EFTuner: A Bi-Objective Configuration Parameter Auto-Tuning Method Towards Energy-Efficient Big Data Processing
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
    August 2023
    332 pages
    ISBN:9798400708947
    DOI:10.1145/3609437
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bi-Objective Optimization
    2. Big Data Processing
    3. Configuration Parameters
    4. Energy Efficiency

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Science Foundation of China
    • University Collaborative Innovation Project of Anhui Province

    Conference

    Internetware 2023

    Acceptance Rates

    Overall Acceptance Rate 55 of 111 submissions, 50%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 42
      Total Downloads
    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 06 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media