skip to main content
10.1145/3649153.3649188acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Mini-batching with Fused Training and Testing for Data Streams Processing on the Edge

Published: 02 July 2024 Publication History

Abstract

Edge Computing (EC) has emerged as a solution to reduce energy demand and greenhouse gas emissions from digital technologies. EC supports low latency, mobility, and location awareness for delay-sensitive applications by bridging the gap between cloud computing services and end-users. Machine learning (ML) methods have been applied in EC for data classification and information processing. Ensemble learners have often proven to yield high predictive performance on data stream classification problems. Mini-batching is a technique proposed for improving cache reuse in multi-core architectures of bagging ensembles for the classification of online data streams, which benefits application speedup and reduces energy consumption. However, the original mini-batching presents limited benefits in terms of cache reuse and it hinders the accuracy of the ensembles (i.e., their capacity to detect behavior changes in data streams). In this paper, we improve mini-batching by fusing continuous training and test loops for the classification of data streams. We evaluated the new strategy by comparing its performance and energy efficiency with the original mini-batching for data stream classification using six ensemble algorithms and four benchmark datasets. We also compare mini-batching strategies with two hardware-based strategies supported by commodity multi-core processors commonly used in EC. Results show that mini-batching strategies can significantly reduce energy consumption in 95% of the experiments. Mini-batching improved energy efficiency by 96% on average and 169% in the best case. Likewise, our new mini-batching strategy improved energy efficiency by 136% on average and 456% in the best case. These strategies also support better control of the balance between performance, energy efficiency, and accuracy.

References

[1]
Luca Benini, Alessandro Bogliolo, and Giovanni De Micheli. 2000. A survey of design techniques for system-level dynamic power management. IEEE Trans. on Very Large Scale Integration (VLSI) Systems 8, 3 (2000), 299--316.
[2]
Albert Bifet and Ricard Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing, In SIAM Intl. Conf. on Data Mining. SIAM Intl. Conf. on Data Mining 7.
[3]
Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. 2010. Leveraging Bagging for Evolving Data Streams. In Machine Learning and Knowledge Discovery in Databases, José Luis Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag (Eds.). Springer Berlin Heidelberg, 135--150.
[4]
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. 2009. New ensemble methods for evolving data streams. In Proc. 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 139--148.
[5]
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen, and Thomas Seidl. 2010. MOA: Massive online analysis, a framework for stream classification and clustering. In Workshop on Applications of Pattern Analysis. PMLR, 44--50.
[6]
Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140.
[7]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
[8]
G. Cassales, H. Gomes, A. Bifet, B. Pfahringer, and H. Senger. 2020. Improving Parallel Performance of Ensemble Learners for Streaming Data Through Data Locality with Mini-Batching. In IEEE Intl Conf. on High Performance Computing and Communications (HPCC). https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00018
[9]
Guilherme Cassales, Heitor Gomes, Albert Bifet, Bernhard Pfahringer, and Hermes Senger. 2021. Improving the performance of bagging ensembles for data streams through mini-batching. Information Sciences 580 (2021), 260--282.
[10]
Guilherme Cassales, Heitor Gomes, Albert Bifet, Bernhard Pfahringer, and Hermes Senger. 2022. Balancing Performance and Energy Consumption of Bagging Ensembles for the Classification of Data Streams in Edge Computing. IEEE Transactions on Network and Service Management (Dec. 2022).
[11]
Demetrios A.M. Coutinho, Daniele de Sensi, Arthur Francisco Lorenzon, Kyriakos Georgiou, Jose Nunez-Yanez, Kerstin Eder, and Samuel Xavier-De-Souza. 2020. Performance and energy trade-offs for parallel applications on heterogeneous multi-processing systems. Energies 13, 9 (may 2020).
[12]
Victor G.uilherme Turrisi da Costa, André Carlos Ponce de Leon F. de Carvalho, and Sylvio Barbon Junior. 2018. Strict Very Fast Decision Tree: A memory conservative algorithm for data stream mining. Pattern Recognition Letters 116 (2018), 22--28.
[13]
Victor G Turrisi da Costa, Everton Jose Santana, Jessica F Lopes, and Sylvio Barbon. 2019. Evaluating the four-way performance trade-off for stream classification. In Intl.ernational Conference on Green, Pervasive, and Cloud Computing. Springer, 3--17.
[14]
Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM SIGKDD, 71--80.
[15]
João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM computing surveys 46, 4 (2014), 1--37.
[16]
Eva García-Martín, Albert Bifet, and Niklas Lavesson. 2021. Energy modeling of Hoeffding tree ensembles. Intelligent Data Analysis 25, 1 (2021), 81--104.
[17]
Heitor Murilo Gomes, Jean Paul Barddal, Fabrício Enembreck, and Albert Bifet. 2017. A survey on ensemble learning for data stream classification. Comput. Surveys 50, 2 (2017), 1--36.
[18]
Heitor M. Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabrício Enembreck, Bernhard Pfharinger, Geoff Holmes, and Talel Abdessalem. 2017. Adaptive random forests for evolving data stream classification. Machine Learning 106, 9 (2017), 1469--1495.
[19]
H.M. Gomes, J. Read, and A. Bifet. 2019. Streaming Random Patches for Evolving Data Stream Classification. In IEEE Intl. Conf. on Data Mining. 240--249.
[20]
Xinqi Jin, Lingkun Li, Fan Dang, Xinlei Chen, and Yunhao Liu. 2021. A survey on edge computing for wearable technology. Digital Signal Processing: A Review Journal 1 (2021), 103146.
[21]
Wazir Zada Khan, Ejaz Ahmed, Saqib Hakak, Ibrar Yaqoob, and Arif Ahmed. 2019. Edge computing: A survey. Future Generation Computer Systems 97 (2019), 219--235.
[22]
J.F. Lopes, E.J. Santana, V. G.T. da Costa, B.B. Zarpelão, and S. Barbon Junior. 2020. Evaluating the Four-Way Performance Trade-Off for Data Stream Classification in Edge Computing. IEEE Transactions on Network and Service Management 17, 2 (2020), 1013--1025.
[23]
Chaitanya Manapragada, Geoffrey I Webb, and Mahsa Salehi. 2018. Extremely fast decision tree. In ACM Intl. Conference on Knowledge Discovery & Data Mining. 1953--1962.
[24]
E. G. Martin, N. Lavesson, and H. Grahn. 2015. Energy efficiency in data stream mining. In 2015 IEEE/ACM Intl. Conf. Advances in Social Networks Analysis and Mining (ASONAM). 1125--1132.
[25]
Anne-Cecile Orgerie, Marcos Dias de Assuncao, and Laurent Lefevre. 2014. A survey on techniques for improving the energy efficiency of large-scale distributed systems. Comput. Surveys 46, 4 (2014), 1--31.
[26]
Nikunj C Oza and Stuart J Russell. 2001. Online bagging and boosting. In International Workshop on Artificial Intelligence and Statistics. PMLR, 229--236.
[27]
Ali Shakarami, Mostafa Ghobaei-Arani, and Ali Shahidinejad. 2020. A survey on the computation offloading approaches in mobile edge computing: A machine learning-based perspective. Computer Networks 182 (dec 2020), 107496.
[28]
Jonathan A Silva, Elaine R Faria, Rodrigo C Barros, Eduardo R Hruschka, André CPLF de Carvalho, and João Gama. 2013. Data stream clustering: A survey. Comput. Surveys 46, 1 (2013), 1--31.
[29]
David C Snowdon, Sergio Ruocco, and Gernot Heiser. 2005. Power management and dynamic voltage scaling: Myths and facts. In Workshop on Power Aware Real-time Computing, New Jersey, USA, Vol. 31. Citeseer, 34.
[30]
J. Vieira, R. P. Duarte, and H. C. Neto. 2019. kNN-STUFF: kNN STreaming Unit for Fpgas. IEEE Access 7 (2019), 170864--170877. https://doi.org/10.1109/ACCESS.2019.2955864
[31]
Xinghuo Yu, Carlo Cecati, Tharam Dillon, and M Godoy Simo. 2011. An Industrial Electronics Perspective. IEEE Industrial Electronics Magazine 5, 9 (2011), 49--63.
[32]
Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica. 2012. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing. USENIX Association, 10--10.

Index Terms

  1. Mini-batching with Fused Training and Testing for Data Streams Processing on the Edge

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
    May 2024
    345 pages
    ISBN:9798400705977
    DOI:10.1145/3649153
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Computing methodologies
    2. Parallel algorithms
    3. Parallel computing methodologies
    4. Shared memory algorithms

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • FAPESP

    Conference

    CF '24
    Sponsor:

    Acceptance Rates

    CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
    Overall Acceptance Rate 273 of 785 submissions, 35%

    Upcoming Conference

    CF '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 65
      Total Downloads
    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media