tutorial

Fine-grained Benchmark Subsetting for System Selection

Authors:

Pablo de Oliveira Castro,

Yuriy Kashnikov,

William JalbyAuthors Info & Claims

CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Pages 132 - 142

https://doi.org/10.1145/2581122.2544144

Published: 15 February 2014 Publication History

Abstract

System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments of source code, called codelets. Then, we identify two causes of redundancy: first, similar codelets; second, codelets called repeatedly. The key idea is to minimize redundancy inside the benchmark suite to speed it up. For each group of similar codelets, only one representative is kept. For codelets called repeatedly and for which the performance does not vary across calls, the number of invocations is reduced. Given an initial benchmark suite, our method produces a set of reduced benchmarks that can be used in place of the original one for system selection.

We evaluate our method on the NAS SER benchmarks, producing a reduced benchmark suite 30 times faster in average than the original suite, with a maximum of 44 times. The reduced suite predicts the execution time on three target architectures with a median error between 3.9% and 8%.

References

[1]

C. Akel, Y. Kashnikov, P. de Oliveira Castro, and W. Jalby. Is Source-code Isolation Viable for Performance Characterization? In Parallel Processing Workshops (ICPPW), 2013 42nd International Conference on. IEEE, 2013.

Digital Library

[2]

M. Arenaz, J. Touriño, and R. Doallo. Xark: An extensible framework for automatic recognition of computational kernels. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(6):32, 2008.

Digital Library

[3]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, et al. The NAS parallel benchmarks summary and preliminary results. In Supercomputing, 1991. Supercomputing'91. Proceedings of the 1991 ACM/IEEE Conference on, pages 158--165. IEEE, 1991.

Digital Library

[4]

C. Bienia, S. Kumar, and K. Li. PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, pages 47--56. IEEE, 2008.

[5]

R. Cammarota, A. Kejariwal, P. D'Alberto, S. Panigrahi, A. V. Veidenbaum, and A. Nicolau. Pruning hardware evaluation space via correlation-driven application similarity analysis. In Proceedings of the 8th ACM International Conference on Computing Frontiers, page 4. ACM, 2011.

Digital Library

[6]

CAPS entreprises. Codelet finder.

[7]

T. E. Carlson, W. Heirman, and L. Eeckhout. Sampled simulation of multi-threaded applications. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, pages 2--12. IEEE, 2013.

[8]

D. Citron. MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences. In ACM SIGARCH Computer Architecture News, volume 31, pages 52--61. ACM, 2003.

Digital Library

[9]

L. Djoudi, D. Barthou, P. Carribault, C. Lemuet, J.-T. Acquaviva, W. Jalby, et al. Maqao: Modular assembler quality analyzer and optimizer for itanium 2. In The 4th Workshop on EPIC architectures and compiler technology, San Jose, 2005.

[10]

L. Eeckhout, J. Sampson, and B. Calder. Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation. In Workload Characterization Symposium, 2005. Proceedings of the IEEE International, pages 2--12. IEEE, 2005.

[11]

K. Hoste and L. Eeckhout. Comparing benchmarks using key microarchitecture-independent characteristics. In Workload Characterization, 2006 IEEE International Symposium on, pages 83--92. IEEE, 2006.

[12]

K. Hoste and L. Eeckhout. Microarchitecture-independent workload characterization. Micro, IEEE, 27(3):63--72, 2007.

Digital Library

[13]

K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. De Bosschere. Performance prediction based on inherent program similarity. In Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 114--122. ACM, 2006.

Digital Library

[14]

A. Joshi, A. Phansalkar, L. Eeckhout, and L. K. John. Measuring benchmark similarity using inherent program characteristics. Computers, IEEE Transactions on, 55(6):769--782, 2006.

Digital Library

[15]

Y. Kashnikov, P. de Oliveira Castro, E. Oseret, and W. Jalby. Evaluating architecture and compiler design through static loop analysis. In High Performance Computing and Simulation (HPCS), 2013 International Conference on, pages 535--544. IEEE, 2013.

[16]

T. Lafage and A. Seznec. Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. In Workload characterization of emerging computer applications, pages 145--163. Springer, 2001.

Digital Library

[17]

Y.-J. Lee and M. Hall. A code isolator: Isolating code fragments from large programs. In Languages and Compilers for High Performance Computing, pages 164--178. Springer, 2005.

Digital Library

[18]

C. Liao, D. J. Quinlan, R. Vuduc, and T. Panas. Effective source-to-source outlining to support whole program empirical optimization. In Languages and Compilers for Parallel Computing, pages 308--322. Springer, 2010.

Digital Library

[19]

J. Noudohouenou, V. Palomares, W. Jalby, D. C. Wong, D. J. Kuck, and J. C. Beyler. Simsys: a performance simulation framework. In Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, page 1. ACM, 2013.

Digital Library

[20]

F. Pérez and B. E. Granger. IPython: a System for Interactive Scientific Computing. Comput. Sci. Eng., 9(3):21--29, May 2007.

Digital Library

[21]

F. Perez, B. E. Granger, and C. P. S. L. Obispo. An open source framework for interactive, collaborative and reproducible scientific computing and education. 2012.

[22]

E. Petit, G. Papaure, F. Bodin, et al. Astex: a hot path based thread extractor for distributed memory system on a chip. In Proceedings of Compilers for Parallel Computers workshop (CPC2006), 2006.

[23]

A. Phansalkar, A. Joshi, and L. K. John. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In ACM SIGARCH Computer Architecture News, volume 35, pages 412--423. ACM, 2007.

Digital Library

[24]

W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical recipes: The art of scientific computing. Cambridge university press, 1986.

Digital Library

[25]

T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Parallel Architectures and Compilation Techniques, 2001. Proceedings. 2001 International Conference on, pages 3--14. IEEE, 2001.

Digital Library

[26]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ACM SIGARCH Computer Architecture News, volume 30, pages 45--57. ACM, 2002.

Digital Library

[27]

R. Thorndike. Who belongs in the family? Psychometrika, 18(4):267--276, 1953.

[28]

J. Treibig, G. Hager, and G. Wellein. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Parallel Processing Workshops (ICPPW), 2010 39th International Conference on, pages 207--216. IEEE, 2010.

Digital Library

[29]

H. Vandierendonck and K. De Bosschere. Many benchmarks stress the same bottlenecks. In Workshop on Computer Architecture Evaluation Using Commercial Workloads, 2004.

[30]

J. H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236--244, 1963.

[31]

D. Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65--85, 1994.

[32]

E. Willighagen. GNU R package 'genalg', 2013.

Cited By

Alcaraz JTehraniJamsaz ADutta ASikora AJannesari ASorribes JCesar E(2022)Predicting number of threads using balanced datasets for openMP regionsComputing10.1007/s00607-022-01081-6Online publication date: 30-Apr-2022
https://doi.org/10.1007/s00607-022-01081-6
Sánchez Barrera IBlack-Schaffer DCasas MMoretó MStupnikova APopov MAyguadé EHwu WBadia RHofstee H(2020)Modeling and optimizing NUMA effects and prefetching with machine learningProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392765(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392765
Kofler KDurillo JGschwandtner PFahringer T(2017)A Region-Aware Multi-Objective Auto-Tuner for Parallel Programs2017 46th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2017.37(190-199)Online publication date: Aug-2017
https://doi.org/10.1109/ICPPW.2017.37
Show More Cited By

Index Terms

Fine-grained Benchmark Subsetting for System Selection

Recommendations

Fine-grained Benchmark Subsetting for System Selection
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments ...
Subsetting the SPEC CPU2006 benchmark suite

On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006 -- the next generation of industry-standardized CPU-intensive benchmark suite. The SPEC CPU benchmark suite has become the most frequently used suite for ...
A Benchmark Characterization of the EEMBC Benchmark Suite

Benchmark consumers expect benchmark suites to be complete, accurate, and consistent, and benchmark scores serve as relative measures of performance. However, it is important to understand how benchmarks stress the processors that they aim to test. This ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

February 2014

328 pages

ISBN:9781450326704

DOI:10.1145/2581122

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Tutorial
Research
Refereed limited

Conference

CGO '14

Sponsor:

CGO '14: 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization

February 15 - 19, 2014

FL, Orlando, USA

Acceptance Rates

CGO '14 Paper Acceptance Rate 29 of 100 submissions, 29%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
184
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alcaraz JTehraniJamsaz ADutta ASikora AJannesari ASorribes JCesar E(2022)Predicting number of threads using balanced datasets for openMP regionsComputing10.1007/s00607-022-01081-6Online publication date: 30-Apr-2022
https://doi.org/10.1007/s00607-022-01081-6
Sánchez Barrera IBlack-Schaffer DCasas MMoretó MStupnikova APopov MAyguadé EHwu WBadia RHofstee H(2020)Modeling and optimizing NUMA effects and prefetching with machine learningProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392765(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392765
Kofler KDurillo JGschwandtner PFahringer T(2017)A Region-Aware Multi-Objective Auto-Tuner for Parallel Programs2017 46th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2017.37(190-199)Online publication date: Aug-2017
https://doi.org/10.1109/ICPPW.2017.37
Popov MAkel CChatelain YJalby Wde Oliveira Castro P(2017)Piecewise holistic autotuning of parallel programs with CEREConcurrency and Computation: Practice and Experience10.1002/cpe.419029:15Online publication date: 20-Jun-2017
https://doi.org/10.1002/cpe.4190
Popov MAkel CJalby WOliveira Castro P(2016)Piecewise Holistic Autotuning of Compiler and Runtime ParametersProceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 983310.1007/978-3-319-43659-3_18(238-250)Online publication date: 24-Aug-2016
https://dl.acm.org/doi/10.1007/978-3-319-43659-3_18
Castro PAkel CPetit EPopov MJalby W(2015)CEREACM Transactions on Architecture and Code Optimization10.1145/272471712:1(1-24)Online publication date: 16-Apr-2015
https://dl.acm.org/doi/10.1145/2724717
Popov MAkel CConti FJalby WCastro P(2015)PCEREProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2015.19(1151-1160)Online publication date: 25-May-2015
https://dl.acm.org/doi/10.1109/IPDPS.2015.19

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents