skip to main content
research-article
Open access

Clustering-Based Selection for the Exploration of Compiler Optimization Sequences

Published: 28 March 2016 Publication History

Abstract

A large number of compiler optimizations are nowadays available to users. These optimizations interact with each other and with the input code in several and complex ways. The sequence of application of optimization passes can have a significant impact on the performance achieved. The effect of the optimizations is both platform and application dependent. The exhaustive exploration of all viable sequences of compiler optimizations for a given code fragment is not feasible. As this exploration is a complex and time-consuming task, several researchers have focused on Design Space Exploration (DSE) strategies both to select optimization sequences to improve the performance of each function of the application and to reduce the exploration time. In this article, we present a DSE scheme based on a clustering approach for grouping functions with similarities and exploration of a reduced search space resulting from the combination of optimizations previously suggested for the functions in each group. The identification of similarities between functions uses a data mining method that is applied to a symbolic code representation. The data mining process combines three algorithms to generate clusters: the Normalized Compression Distance, the Neighbor Joining, and a new ambiguity-based clustering algorithm. Our experiments for evaluating the effectiveness of the proposed approach address the exploration of optimization sequences in the context of the ReflectC compiler, considering 49 compilation passes while targeting a Xilinx MicroBlaze processor, and aiming at performance improvements for 51 functions and four applications. Experimental results reveal that the use of our clustering-based DSE approach achieves a significant reduction in the total exploration time of the search space (20× over a Genetic Algorithm approach) at the same time that considerable performance speedups (41% over the baseline) were obtained using the optimized codes. Additional experiments were performed considering the LLVM compiler, considering 124 compilation passes, and targeting a LEON3 processor. The results show that our approach achieved geometric mean speedups of 1.49 ×, 1.32 ×, and 1.24 × for the best 10, 20, and 30 functions, respectively, and a global improvement of 7% over the performance obtained when compiling with -O2.

References

[1]
ACE. 2012. CoSy Compiler Development System. Retrieved from http://www.ace.nl/compiler/cosy.html.
[2]
Aeroflex. 2003. TSIM2 ERC32/LEON Simulator. Retrieved from http://www.gaisler.com/index.php/products/simulators/tsim.
[3]
Aeroflex. 2005. LEON3 Processor. Retrieved from http://www.gaisler.com/index.php/products/processors/leon3.
[4]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. 2006. Using machine learning to focus iterative optimization. In Proc. of the Int. Symp. on Code Generation and Optimization (CGO’06). 295--305.
[5]
L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman. 2004. Finding effective compilation sequences. In Proc. of the ACM Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES’04), Vol. 39. 231--239.
[6]
A. H. Ashouri, G. Mariani, G. Palermo, and C. Silvano. 2014. A Bayesian network approach for compiler auto-tuning for embedded processors. In IEEE 12th Symp. on Embedded Systems for Real-time Multimedia (ESTIMedia’14). 90--97.
[7]
J. M. P. Cardoso, P. C. Diniz, J. G. F. Coutinho, and Z. M. Petrov (Eds.). 2013. Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach. Springer.
[8]
J. M. P. Cardoso, T. Carvalho, J. G. F. Coutinho, W. Luk, R. Nobre, P. Diniz, and Z. M. Petrov. 2012. LARA: An aspect-oriented programming language for embedded systems. In Proc. of the 11th ACM Int. Conf. on Aspect-Oriented Software Development (AOSD’12). 179--190.
[9]
J. Cavazos and M. F. P. O’Boyle. 2006. Method-specific dynamic compilation using logistic regression. In Proc. of the ACM Int. Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’06). 229--240.
[10]
Y. Chen, S. Fang, Y. Huang, L. Eeckhout, G. Fursin, O. Temam, and C. Wu. 2012. Deconstructing iterative optimization. ACM Trans. Archit. Code Optim. 9, 3 (2012), 21:1--21:30.
[11]
A. R. Cilibrasi and A. P. Vitanyi. 2005. Clustering by compression. In Proc. of the IEEE Trans. Information Theory (AOSD’12), Vol. 51. 1523--1545.
[12]
A. R. Cilibrasi, A. L. Cruz, S. de Rooij, and M. Keijzer. 2008. CompLearn Toolkit. Retrieved from http://www.complearn.org/.
[13]
K. D. Cooper, P. J. Schielke, and D. Subramanian. 1999. Optimizing for reduced code space using genetic algorithms. In Proc. of the ACM Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES’99). 1--9.
[14]
K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. 2006. Exploring the structure of the space of compilation sequences using randomized search algorithms. J. Supercomput. 36, 2 (2006), 135--151.
[15]
J. Felsenstein. 2003. Inferring Phylogenies. Sinauer Associates.
[16]
G. Fursin, Y. Kashnikov, A. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, and M. O’Boyle. 2011. Milepost GCC: Machine learning enabled self-tuning compiler. Int. J. Parallel Program. 39, 3 (2011), 296--327.
[17]
D. E. Goldberg. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman.
[18]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proc. of the IEEE Int. Workshop of the Workload Characterization (WWC’01). 3--14.
[19]
M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. 2005. Optimizing general purpose compiler optimization. In Proc. of the 2nd Conf. on Computing Frontiers (CF’05). 180--188.
[20]
K. Hoste and L. Eeckhout. 2008. Cole: Compiler optimization level exploration. In Proc. of the 6th Int. Symp. on Code Generation and Optimization (CGO’08). 165--174.
[21]
Q. Huang, R. Lian, A. Canis, J. Choi, R. Xi, S. Brown, and J. Anderson. 2013. The effect of compiler optimizations on high-level synthesis for FPGAs. In Proc. of the IEEE Int. Symp. on Field-Programmable Custom Computing Machines (FCCM’13). 89--96.
[22]
M. R. Jantz and P. A. Kulkarni. 2013. Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches. In Proc. of the Conf. on Compilers, Archit. and Synthesis for Embedded Systems (CASES’13). 7:1--7:10.
[23]
P. A. Kulkarni, M. R. Jantz, and D. B. Whalley. 2010. Improving both the performance benefits and speed of optimization phase sequence searches. In Proc. of the ACM Conf. on Lang., Comp., and Tools for Emb. Syst. (LCTES’10). 95--104.
[24]
P. A. Kulkarni, D. B. Whalley, and G. S. Tyson. 2007. Evaluating heuristic optimization phase order search algorithms. In Proc. of the IEEE Int. Symp. on Code Generation and Optimization (CGO’07). 157--169.
[25]
S. Kulkarni and J. Cavazos. 2012. Mitigating the compiler optimization phase-ordering problem using machine learning. In Proc. of the ACM Conf. on Object Oriented Programming Systems Lang. and Applications (OOPSLA’12). 147--162.
[26]
P. A. Kulkarni, D. B. Whalley, G. S. Tyson, and J. W. Davidson. 2009. Practical exhaustive optimization phase order exploration and evaluation. ACM Trans. Architect. Code Optim. 6, 1 (April 2009), 1:1--1:36.
[27]
C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the Int. Symp. on Code Generation and Optimization (CGO’’04). 75.
[28]
C. G. Lee. 2002. UTDSP benchmark suite. (2002). http://www.eecg.toronto.edu/∼corinna/DSP/infrastructure/UTDSP.tar.gz.
[29]
M. Li and P. M. B. Vitanyi. 1997. An Introduction to Kolmogorov Complexity and Its Applications (2nd ed.). Springer-Verlag.
[30]
L. G. A. Martins, R. Nobre, A. C. B. Delbem, E. Marques, and J. M. P. Cardoso. 2014a. A clustering-based approach for exploring sequences of compiler optimizations. In Proc. of the IEEE Congress on Evolutionary Computation (CEC’14). 1--8.
[31]
L. G. A. Martins, R. Nobre, A. C. B. Delbem, E. Marques, and J. M. P. Cardoso. 2014b. Exploration of compiler optimization sequences using clustering-based selection. In Proc. of the ACM Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES’14). 63--72.
[32]
M. Newman. 2010. Networks: An Introduction. Oxford University Press.
[33]
Z. Pan and R. Eigenmann. 2008. PEAK: A fast and effective performance tuning system via compiler optimization orchestration. ACM Trans. Program. Lang. Syst. 30, 3 (2008), 1--17.
[34]
S. Purini and L. Jain. 2013. Finding good optimization sequences covering program space. ACM Trans. Architect. Code Optim. 9, 4 (January 2013), 56:1--56:23.
[35]
C. K. Roy, J. R. Cordy, and R. Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Sci. Comput. Program. 74, 7 (May 2009), 470--495.
[36]
A. Sanches and J. M. P. Cardoso. 2010. On identifying patterns in code repositories to assist the generation of hardware templates. In Proc. of the 20th Int. Conf. on Field Programmable Logic and Applications (FPL’10). 267--270.
[37]
A. Sanches, J. M. P. Cardoso, and A. C. B. Delbem. 2011. Identifying merge-beneficial software kernels for hardware implementation. In Proc. of the Int. conf. on Reconfigurable Computing and FPGAs (ReConFig’11). 74--79.
[38]
R. Sanchez, J. Amaral, D. Szafron, M. Pirvu, and M. Stoodley. 2011. Using machines to learn method-specific compilation strategies. In Proc. of the Int. Symp. on Code Generation and Optimization (CGO’11). 257--266.
[39]
Texas Instruments. 2003a. TMS320C64x DSP Library: Programmer’s Reference. (2003).
[40]
Texas Instruments. 2003b. TMS320C64x Image/Video Processing Library. (2003).
[41]
M. P. J. van der Loo. 2014. The stringdist package for approximate string matching. The R Journal 6 (2014), 111--122.
[42]
T. Wheeler and J. Kececioglu. 2007. Multiple alignment by aligning alignments. In Proc. of the 15th ISCB Conf. on Intelligent Systems for Molecular Biology, Bioinformatics, Vol. 23. i559--i568.
[43]
E. Zitzler, M. Laumanns, and L. Thiele. 2001. SPEA2: Improving the Strength Pareto Evolutionary Algorithm. Computer Engineering and Networks Lab Technical Report TR-200. Swiss Federal Institute of Technology, Cambridge, MA.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 1
April 2016
347 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2899032
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2016
Accepted: 01 January 2016
Revised: 01 December 2015
Received: 01 May 2015
Published in TACO Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. design space exploration
  3. phase-ordering problem

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • FCT (Portuguese Science Foundation)
  • FCT project
  • CAPES (process: 0352/13-6)
  • ACE Associated Compiler Experts bv, The Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)212
  • Downloads (Last 6 weeks)28
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media