skip to main content
research-article

An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheads

Published: 29 December 2022 Publication History

Abstract

In the last few decades, technology advancements have paved the way for the creation of intelligent and autonomous systems that utilize complex calculations which are both time‐consuming and central processing unit intensive. As a consequence, parallel processing systems are gaining popularity to enhance overall computer performance. Programmers should be able to efficiently utilize available hardware resources with parallelization in an ideal world. Through the automatic parallelization of sequential code, multithreading can be executed without extra supervision. However, a wide range of software dependencies prevents this from being feasible. An architectural framework for speculative parallelization along with an efficient memory analysis and computational algorithms for the code generation are proposed that can provide optimal performance. Furthermore, a suitable support of hardware design as a runtime library to the proposed architectural framework is presented which can be used to recover misspeculated results during execution to minimize speculative parallelism overhead. The implementation makes use of the Low‐Level Virtual Machine compiler infrastructure and is tested on numerous benchmarks, thus making it highly scalable in terms of programming languages and architectures. According to our experimental results, there is significant potential for speedup increase. In comparison to the overall function speedup, that is, geomean speedup of 5.2× approximately when using the proposed architecture without hardware support, the proposed architectural framework and algorithm with hardware support give an average geomean speedup of 7.0× approximately on the given benchmark which is written in C/C++.

References

[1]
Lynn T, Rosati P, Endo PT. Toward the intelligent internet of everything: observations on multidisciplinary challenges in intelligent systems research. Technol Sci Cult: Global Vision. 2018;116:52‐64.
[2]
Gupta BB, Gaurav A, Marín EC, Alhalabi W. Novel graph‐based machine learning technique to secure smart vehicles in intelligent transportation systems. IEEE Trans Intell Transp Syst. 2022;1‐9. doi:10.1109/TITS.2022.3174333
[3]
Zhou Z, Gaurav A, Gupta BB, Lytras MD, Razzak I. A fine‐grained access control and security approach for intelligent vehicular transport in 6G communication system. IEEE Trans Intell Transp Syst. 2022;23(7):9726‐9735.
[4]
Fatemidokht H, Rafsanjani MK, Gupta BB, Hsu CH. Efficient and secure routing protocol based on artificial intelligence algorithms with UAV‐assisted for vehicular ad hoc networks in intelligent transportation systems. IEEE Trans Intell Transp Syst. 2021;22(7):4757‐4769.
[5]
Chopra M, Kumar S, Madan U, Sharma S. Influence and establishment of smart transport in smart cities. In: International Conference on Smart Systems and Advanced Computing (Syscom‐2021); 2021.
[6]
Singh I, Singh SK, Kumar S, Aggarwal K. Dropout‐VGG based convolutional neural network for traffic sign categorization. In: Congress on Intelligent Systems Springer. Springer; 2022:247‐261.
[7]
Chaudhary P, Gupta BB, Singh A. Securing heterogeneous embedded devices against XSS attack in intelligent IoT system. Comput Secur. 2022;118:102710. https://www.sciencedirect.com/science/article/pii/S0167404822001080
[8]
Aggarwal K, Singh SK, Chopra M, Kumar S, Colace F. Deep learning in robotics for strengthening industry 4.0: opportunities, challenges and future directions. In: Robotics and AI for Cybersecurity and Critical Infrastructure in Smart Cities; 2022:1‐19.
[9]
Ali AM, Shamsuddin SM, Eassa FE, et al. Towards an intelligent framework for cloud service discovery. Int J Cloud Appl Comput. 2021;11(3):33‐57.
[10]
Vinoth R, Deborah LJ, Vijayakumar P, Gupta BB. An anonymous pre‐authentication and post‐authentication scheme assisted by cloud for medical IoT environments. IEEE Trans Network Sci Eng. 2022:1. doi:10.1109/TNSE.2022.3176407
[11]
Trappey AJC, Trappey CV, Chang AC. Intelligent extraction of a knowledge ontology from global patents. Int J Semant Web Inf Syst. 2020;16(4):61‐80.
[12]
Chen TY, Chen YM, Tsai MC. A status property classifier of social media user's personality for customer‐oriented intelligent marketing systems. In: Research Anthology on Strategies for Using Social Media as a Service and Tool in Business. IGI Global; 2021:557‐581.
[13]
Chui KT, Gupta BB, Liu RW, Vasant P. Handling data heterogeneity in electricity load disaggregation via optimized complete ensemble empirical mode decomposition and wavelet packet transform. Sensors. 2021;21(9):3133. https://www.mdpi.com/1424-8220/21/9/3133
[14]
Aggarwal K, Singh SK, Chopra M, Kumar S. Role of social media in the COVID‐19 pandemic: a literature review. In: Data Mining Approaches for Big Data and Sentiment Analysis in Social Media; 2022:91‐115. doi:10.4018/978-1-7998-8413-2.ch004
[15]
Singh A, Singh SK, Mittal A. A review on dataset acquisition techniques in gesture recognition from Indian sign language. Adv Data Comput Commun Secur. 2022;305‐313. doi:10.1007/978-981-16-8403-6_27
[16]
Mirsadeghi F, Rafsanjani MK, Gupta BB. A trust infrastructure based authentication method for clustered vehicular ad hoc networks. Peer‐to‐Peer Networking Appl. 2021;14(4):2537‐2553. doi:10.1007/s12083-020-01010-4
[17]
Abbas N, Sharafeddine S, Mourad A, Abou‐Rjeily C, Fawaz W. Joint computing, communication and cost‐aware task offloading in D2D‐enabled Het‐MEC. Comput Networks. 2022;209:108900. https://www.sciencedirect.com/science/article/pii/S1389128622000974
[18]
Chopra M, Singh DSK, Gupta A, Aggarwal K, Gupta BB, Colace F. Analysis & prognosis of sustainable development goals using big data‐based approach during COVID‐19 pandemic. Sustainable Technol Entrepreneurship. 2022;1(2):100012. https://www.sciencedirect.com/science/article/pii/S2773032822000128
[19]
Lee MT, Suh I. Understanding the effects of environment, social, and governance conduct on financial performance: arguments for a process and integrated modelling approach. Sustainable Technol Entrepreneurship. 2022;1(1):100004. https://www.sciencedirect.com/science/article/pii/S2773032822000049
[20]
Xu M, Peng J, Gupta BB, et al. Multi‐agent federated reinforcement learning for secure incentive mechanism in intelligent cyber–physical systems. IEEE Internet Things J. 2021;1. doi:10.1109/JIOT.2021.3081626
[21]
Liu RW, Guo Y, Lu Y, Chui KT, Gupta BB. Deep network‐enabled haze visibility enhancement for visual IoT‐driven intelligent transportation systems. IEEE Trans Ind Inf. 2022. doi:10.1109/TII.2022.3170594
[22]
Menshutina NV, Goncharova‐Alves SV, Matasov AV. Intelligent systems for design of drying processes. Drying Technol. 2020;38(1‐2):147‐157.
[23]
Diaz J, Munoz‐Caro C, Nino A. A survey of parallel programming models and tools in the multi and many‐core era. IEEE Trans Parallel Distrib Syst. 2012;23(8):1369‐1386.
[24]
Tatemura J. Speculative parallelism of intelligent interactive systems. In: Proceedings of the IECON '95—21st Annual Conference on IEEE Industrial Electronics. Vol 1; 1995:193‐198.
[25]
Kumar S, Singh SK, Aggarwal N, Aggarwal K. Evaluation of automatic parallelization algorithms to minimize speculative parallelism overheads: an experiment. J Discrete Math Sci Cryptography. 2021;24(5):1517‐1528.
[26]
Almasri M, Hajj IE, Nagi R, Xiong J, Hwu W‐m. Parallel K‐clique counting on GPUs. In: Proceedings of the 36th ACM International Conference on Supercomputing; 2022:1‐14.
[27]
Wanwu L, Lin L, Jixian Z, Shuai L, Jiahao Q. Multi‐core parallel architecture design and experiment for deep learning model training. Multimedia Tools Appl. 2022;81(8):11587‐11604.
[28]
Attia KM, El‐Hosseini MA, Ali HA. Dynamic power management techniques in multi‐core architectures: a survey study. Ain Shams Eng J. 2017;8(3):445‐456.
[29]
Xu Y, Lee H, Chen D, et al. GSPMD: general and scalable parallelization for ML computation graphs. arXiv preprint arXiv:2105.04663; 2021.
[30]
Baskakov SD, Cardenas JG. Source to source compiler for the automatic parallelization of JavaScript code. In: 2021 IEEE XXVIII International Conference on Electronics, Electrical Engineering and Computing (INTERCON). IEEE; 2021:1‐4.
[31]
de Montis D, Besnard JB, Alias C. A Polyhedral Approach for Auto‐Parallelization using a Distributed Virtual Machine. Ph.D. Thesis. INRIA, LIP‐ENS Lyon, Paratools; 2021.
[32]
Kataev N, Kolganov A. Additional parallelization of existing MPI programs using SAPFOR. In: International Conference on Parallel Computing Technologies. Springer; 2021:41‐52.
[33]
Dolz MF, Astorga DDR, Fernández J, García JD, Carretero J. Towards automatic parallelization of stream processing applications. IEEE Access. 2018;6:39944‐39961.
[34]
Wang C, Gong L, Li X, Zhou X. A ubiquitous machine learning accelerator with automatic parallelization on FPGA. IEEE Trans Parallel Distrib Syst. 2020;31(10):2346‐2359.
[35]
Süß T, Nagel L, Vef MA, Brinkmann A, Feld D, Soddemann T. Pure functions in C: a small keyword for automatic parallelization. Int J Parallel Program. 2021;49(1):1‐24.
[36]
Ying VA, Jeffrey MC, Sanchez D. T4: compiling sequential code for effective speculative parallelization in hardware. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE; 2020:159‐172.
[37]
Akkary H, Jothi K, Retnamma R, Nekkalapu S, Hall D, Shahidzadeh S. On the potential of latency tolerant execution in speculative multithreading. In: Proceedings of the 1st International Forum on Next‐Generation Multicore/Manycore Technologies; 2008:1‐10.
[38]
Bhattacharyya A, Amaral JN. Automatic speculative parallelization of loops using polyhedral dependence analysis. In: Proceedings of the First International Workshop on Code Optimisation for Multi and Many Cores; 2013:1‐9.
[39]
Fitzmorris L. Learning Assisted Decoupled Software Pipelining (LA‐DSWP); 2018. https://hdl.handle.net/11244/299691
[40]
Honorio BC, Carvalho dJP, Skaf M, Araujo G. Using OpenMP to detect and speculate dynamic DOALL loops. In: International Workshop on OpenMP. Springer; 2020:231‐246.
[41]
Liu S, Cui YZ, Zou NJ, Zhu WH, Zhang D, Wu WG. Revisiting the parallel strategy for DOACROSS loops. J Comput Sci Technol. 2019;34(2):456‐475.
[42]
Hurson AR, Lim JT, Kavi KM, Lee B. Parallelization of DOALL and DOACROSS loops—a survey. In: Advances in Computers. Vol 45. Elsevier; 1997:53‐103.
[43]
Ioannou N, Cintra M. Complementing user‐level coarse‐grain parallelism with implicit speculative parallelism. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture; 2011:284‐295.
[44]
Kejariwal A, Girkar M, Tian X, et al. Exploitation of nested thread‐level speculative parallelism on multi‐core systems. In: Proceedings of the 7th ACM International Conference on Computing Frontiers; 2010:99‐100.
[45]
Ding C, Shen X, Kelsey K, Tice C, Huang R, Zhang C. Software behavior oriented parallelization. ACM Sigplan Not. 2007;42(6):223‐234.
[46]
Cao Z, Verbrugge C. Mixed model universal software thread‐level speculation. In: 2013 42nd International Conference on Parallel Processing. IEEE; 2013:651‐660.
[47]
Dang FH, Rauchwerger L. Speculative parallelization of partially parallel loops. Lecture Notes in Computer Science. 2000;285‐299. doi:10.1007/3-540-40889-4_22
[48]
Sohi GS, Breach SE, Vijaykumar T. Multiscalar processors. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture; 1995:414‐425.
[49]
Hammond L, Hubbert BA, Siu M, Prabhu MK, Chen M, Olukolun K. The Stanford hydra CMP. IEEE Micro. 2000;20(2):71‐84.
[50]
Steffan JG, Colohan C, Zhai A, Mowry TC. The STAMPede approach to thread‐level speculation. ACM Trans Comput Syst (TOCS). 2005;23(3):253‐300.
[51]
Rotenberg E, Jacobson Q, Sazeides Y, Smith J. Trace processors. In: Proceedings of the 30th Annual International Symposium on Microarchitecture. IEEE; 1997:138‐148.
[52]
Codrescu L, Wills DS, Meindl J. Architecture of the Atlas chip‐multiprocessor: dynamically parallelizing irregular applications. IEEE Trans Comput. 2001;50(1):67‐82.
[53]
Quiñones CG, Madriles C, Sánchez J, Marcuello P, González A, Tullsen DM. Mitosis compiler: an infrastructure for speculative threading based on pre‐computation slices. ACM Sigplan Not. 2005;40(6):269‐279.
[54]
Shamseddine M, Lakkis I. A novel spatio‐temporally adaptive parallel three‐dimensional DSMC solver for unsteady rarefied micro/nano gas flows. Comput Fluids. 2019;186:1‐14.
[55]
Akkad G, Mansour A, ElHassan B, LeRoy F, Najem M. FFT Radix‐2 and Radix‐4 FPGA acceleration techniques using HLS and HDL for digital communication systems. In: 2018 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET). IEEE; 2018:1‐5.
[56]
Larus J, Kozyrakis C. Transactional memory. Commun ACM. 2008;51(7):80‐88.
[57]
Salamanca J, Baldassin A. Improving speculative taskloop in hardware transactional memory. In: International Workshop on OpenMP. Springer; 2021:3‐17.
[58]
Krishnan RM, Kim J, Mathew A, et al. Durable transactional memory can scale with timestone. In: Proceedings of the Twenty‐Fifth International Conference on Architectural Support for Programming Languages and Operating Systems; 2020:335‐349.
[59]
El Zini J, Rizk Y, Awad M. An optimized parallel implementation of non‐iteratively trained recurrent neural networks. J Artif Intell Soft Comput Res. 2021;11:33‐50.
[60]
Gregory S. Experiments with speculative parallelism in Parlog. In: Proceedings of the 1993 International Symposium on Logic Programming ILPS '93. MIT Press; 1993:370‐387.
[61]
Warg F, Stenstrom P. Reducing misspeculation overhead for module‐level speculative execution. In: Proceedings of the 2nd Conference on Computing Frontiers; 2005:289‐298.
[62]
Krishnan V, Torrellas J. A chip‐multiprocessor architecture with speculative multithreading. IEEE Trans Comput. 1999;48(9):866‐880.
[63]
Rus S, Rauchwerger L, Hoeflinger J. Hybrid analysis: static & dynamic memory reference analysis. Int J Parallel Program. 2003;31(4):251‐283. doi:10.1023/A:1024597010150
[64]
Dave C, Bae H, Min SJ, Lee S, Eigenmann R, Midkiff S. Cetus: a source‐to‐source compiler infrastructure for multicores. Computer. 2009;42(12):36‐42.
[65]
Lattner C, Adve V. LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004, CGO 2004. IEEE; 2004:75‐86.
[66]
Tschüter R, Ziegenbalg J, Wesarg B, et al. An LLVM instrumentation plug‐in for score‐p. In: International Symposium on Code Generation and Optimization, 2004, CGO 2004; 2017:1‐8.
[67]
Ebner D, Brandner F, Scholz B, Krall A, Wiedermann P, Kadlec A. Generalized instruction selection using SSA‐graphs. In: Proceedings of the 2008 ACMSIGPLAN‐SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems; 2008:31‐40.
[68]
Singh I, Singh SK, Singh R, Kumar S. Efficient loop unrolling factor prediction algorithm using machine learning models. In: 2022 3rd International Conference for Emerging Technology (INCET); 2022:1‐8.
[69]
West BN. Adding Operator Strength Reduction to LLVM; 2011.
[70]
Calman S, Zhu J. Interprocedural induction variable analysis based on interprocedural SSA form IR. In: Proceedings of the 9th ACM SIGPLAN‐SIGSOFT Workshop on Program Analysis for Software Tools and Engineering‐PASTE'10; 2010:37‐44.
[71]
Morales CM, Honorio B, Baldassin A, Araujo G. Improving phased transactional memory via commit throughput and capacity estimation. In: 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC‐PAD). IEEE; 2021:44‐53.
[72]
Filipe R, Issa S, Romano P, Barreto J. Stretching the capacity of hardware transactional memory in IBM POWER architectures. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming; 2019:107‐119.
[73]
Le HQ, Guthrie GL, Williams DE, et al. Transactional memory support in the IBM POWER8 processor. IBM J Res Dev. 2015;59(1):8:1‐8:14.
[74]
Rajwar R, Dixon M. Intel transactional synchronization extensions. In: Intel Developer Forum San Francisco. Vol 2012; 2012.
[76]
Rajwar R, Herlihy M, Lai K. Virtualizing transactional memory. In: 32nd International Symposium on Computer Architecture (ISCA'05). IEEE; 2005:494‐505.
[77]
Ananian CS, Asanovic K, Kuszmaul BC, Leiserson CE, Lie S. Unbounded transactional memory. In: 11th International Symposium on High‐Performance Computer Architecture. IEEE; 2005:316‐327.
[78]
Francis RS, Pears AN. Self scheduling and execution threads. In: Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990. IEEE; 1990:586‐590.
[79]
Polychronopoulos CD, Kuck DJ. Guided self‐scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans Comput. 1987;100(12):1425‐1439.
[80]
Hummel SF, Schonberg E, Flynn LE. Factoring: a practical and robust method for scheduling parallel loops. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing; 1991:610‐632.
[81]
Tzen TH, Ni LM. Trapezoid self‐scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans Parallel Distrib Syst. 1993;4(1):87‐98.
[82]
Yue KK, Lilja DJ. Parallel loop scheduling for high performance computers. Adv Parallel Comput. 1995;10:243‐264.
[83]
Singh SK. Linux Yourself: Concept and Programming. 1st ed. New York: Chapman and Hall/CRC; 2021.
[84]
Linux PC Benchmarks Ubuntu ‐ Roy Longbottom's PC Benchmark Collection . Accessed July 2, 2022. http://www.roylongbottom.org.uk/linux%20benchmarks.htm

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Intelligent Systems
International Journal of Intelligent Systems  Volume 37, Issue 12
December 2022
2488 pages
ISSN:0884-8173
DOI:10.1002/int.v37.12
Issue’s Table of Contents

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 29 December 2022

Author Tags

  1. automatic parallelization
  2. batchwise transaction
  3. hardware transactional memory
  4. intelligent system
  5. LLVM compiler infrastructure
  6. speculative parallelization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media