skip to main content
10.1145/2581122.2544144acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
tutorial

Fine-grained Benchmark Subsetting for System Selection

Published: 15 February 2014 Publication History

Abstract

System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments of source code, called codelets. Then, we identify two causes of redundancy: first, similar codelets; second, codelets called repeatedly. The key idea is to minimize redundancy inside the benchmark suite to speed it up. For each group of similar codelets, only one representative is kept. For codelets called repeatedly and for which the performance does not vary across calls, the number of invocations is reduced. Given an initial benchmark suite, our method produces a set of reduced benchmarks that can be used in place of the original one for system selection.
We evaluate our method on the NAS SER benchmarks, producing a reduced benchmark suite 30 times faster in average than the original suite, with a maximum of 44 times. The reduced suite predicts the execution time on three target architectures with a median error between 3.9% and 8%.

References

[1]
C. Akel, Y. Kashnikov, P. de Oliveira Castro, and W. Jalby. Is Source-code Isolation Viable for Performance Characterization? In Parallel Processing Workshops (ICPPW), 2013 42nd International Conference on. IEEE, 2013.
[2]
M. Arenaz, J. Touriño, and R. Doallo. Xark: An extensible framework for automatic recognition of computational kernels. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(6):32, 2008.
[3]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, et al. The NAS parallel benchmarks summary and preliminary results. In Supercomputing, 1991. Supercomputing'91. Proceedings of the 1991 ACM/IEEE Conference on, pages 158--165. IEEE, 1991.
[4]
C. Bienia, S. Kumar, and K. Li. PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, pages 47--56. IEEE, 2008.
[5]
R. Cammarota, A. Kejariwal, P. D'Alberto, S. Panigrahi, A. V. Veidenbaum, and A. Nicolau. Pruning hardware evaluation space via correlation-driven application similarity analysis. In Proceedings of the 8th ACM International Conference on Computing Frontiers, page 4. ACM, 2011.
[6]
CAPS entreprises. Codelet finder.
[7]
T. E. Carlson, W. Heirman, and L. Eeckhout. Sampled simulation of multi-threaded applications. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, pages 2--12. IEEE, 2013.
[8]
D. Citron. MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences. In ACM SIGARCH Computer Architecture News, volume 31, pages 52--61. ACM, 2003.
[9]
L. Djoudi, D. Barthou, P. Carribault, C. Lemuet, J.-T. Acquaviva, W. Jalby, et al. Maqao: Modular assembler quality analyzer and optimizer for itanium 2. In The 4th Workshop on EPIC architectures and compiler technology, San Jose, 2005.
[10]
L. Eeckhout, J. Sampson, and B. Calder. Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation. In Workload Characterization Symposium, 2005. Proceedings of the IEEE International, pages 2--12. IEEE, 2005.
[11]
K. Hoste and L. Eeckhout. Comparing benchmarks using key microarchitecture-independent characteristics. In Workload Characterization, 2006 IEEE International Symposium on, pages 83--92. IEEE, 2006.
[12]
K. Hoste and L. Eeckhout. Microarchitecture-independent workload characterization. Micro, IEEE, 27(3):63--72, 2007.
[13]
K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. De Bosschere. Performance prediction based on inherent program similarity. In Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 114--122. ACM, 2006.
[14]
A. Joshi, A. Phansalkar, L. Eeckhout, and L. K. John. Measuring benchmark similarity using inherent program characteristics. Computers, IEEE Transactions on, 55(6):769--782, 2006.
[15]
Y. Kashnikov, P. de Oliveira Castro, E. Oseret, and W. Jalby. Evaluating architecture and compiler design through static loop analysis. In High Performance Computing and Simulation (HPCS), 2013 International Conference on, pages 535--544. IEEE, 2013.
[16]
T. Lafage and A. Seznec. Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. In Workload characterization of emerging computer applications, pages 145--163. Springer, 2001.
[17]
Y.-J. Lee and M. Hall. A code isolator: Isolating code fragments from large programs. In Languages and Compilers for High Performance Computing, pages 164--178. Springer, 2005.
[18]
C. Liao, D. J. Quinlan, R. Vuduc, and T. Panas. Effective source-to-source outlining to support whole program empirical optimization. In Languages and Compilers for Parallel Computing, pages 308--322. Springer, 2010.
[19]
J. Noudohouenou, V. Palomares, W. Jalby, D. C. Wong, D. J. Kuck, and J. C. Beyler. Simsys: a performance simulation framework. In Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, page 1. ACM, 2013.
[20]
F. Pérez and B. E. Granger. IPython: a System for Interactive Scientific Computing. Comput. Sci. Eng., 9(3):21--29, May 2007.
[21]
F. Perez, B. E. Granger, and C. P. S. L. Obispo. An open source framework for interactive, collaborative and reproducible scientific computing and education. 2012.
[22]
E. Petit, G. Papaure, F. Bodin, et al. Astex: a hot path based thread extractor for distributed memory system on a chip. In Proceedings of Compilers for Parallel Computers workshop (CPC2006), 2006.
[23]
A. Phansalkar, A. Joshi, and L. K. John. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In ACM SIGARCH Computer Architecture News, volume 35, pages 412--423. ACM, 2007.
[24]
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical recipes: The art of scientific computing. Cambridge university press, 1986.
[25]
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Parallel Architectures and Compilation Techniques, 2001. Proceedings. 2001 International Conference on, pages 3--14. IEEE, 2001.
[26]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ACM SIGARCH Computer Architecture News, volume 30, pages 45--57. ACM, 2002.
[27]
R. Thorndike. Who belongs in the family? Psychometrika, 18(4):267--276, 1953.
[28]
J. Treibig, G. Hager, and G. Wellein. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Parallel Processing Workshops (ICPPW), 2010 39th International Conference on, pages 207--216. IEEE, 2010.
[29]
H. Vandierendonck and K. De Bosschere. Many benchmarks stress the same bottlenecks. In Workshop on Computer Architecture Evaluation Using Commercial Workloads, 2004.
[30]
J. H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236--244, 1963.
[31]
D. Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65--85, 1994.
[32]
E. Willighagen. GNU R package 'genalg', 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
February 2014
328 pages
ISBN:9781450326704
DOI:10.1145/2581122
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CGO '14

Acceptance Rates

CGO '14 Paper Acceptance Rate 29 of 100 submissions, 29%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media