skip to main content
10.1007/11532378_10guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

JuliusC: a practical approach for the analysis of divide-and-conquer algorithms

Published: 22 September 2004 Publication History

Abstract

The development of divide and conquer (D&C) algorithms for matrix computations has led to the widespread use of high- performance scientific applications and libraries. In turn, D&C algorithms can be implemented using loop nests or recursion. Recursion is extremely appealing because it is an intuitive means for the deployment of top-down techniques, which exploit data locality and parallelism naturally. However, recursion has been considered impractical for high-performance codes, mostly because of the inherent overhead of the division process into small subproblems.
In this work, we develop techniques to model the behavior of recursive algorithms in a way suitable for use by a compiler in estimating and reducing the division process overheads. We describe these techniques and JuliusC, a (lite) C compiler, which we developed to exploit them. JuliusC unfolds the application call graph (partially) and extracts the relations among function calls. As a final result, it produces a directed acyclic graph (DAG) modeling the function calls concisely. The approach is a combination of compile-time and run-time analysis and both have negligible complexity.
We illustrate the applicability of our approach by studying 6 test cases. We present the analysis results and we show how our (optimizing) compiler can use these results to increase the efficiency of the division process between 14 to 20 million times, for our codes.

References

[1]
Kagström, B., Ling, P., van Loan, C.: Gemm-based level 3 blas: high-performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 24 (1998) 268-302
[2]
: (LAPACK - Linear Algebra PACKage) http://www.netlib.org/lapack/.
[3]
Dongarra, J., Duff, I., D.C.Soransen, van Der Vorst, H.: Numerical Linear Algebra for Performance Computers. SIAM (2000)
[4]
Golub, G., van Loan, C.: Matrix Computations. Ed. The Johns Hopins University Press (1996)
[5]
Frens, J., Wise, D.: Auto-blocking matrix-multiplication or tracking blas3 performance from source code. In: Proc. 1997 ACM Symp. on Principles and Practice of Parallel Programming. Volume 32. (1997) 206-216
[6]
Park, J., Penner, M., Prasanna, V.: Optimizing graph algorithms for improved cache performance. In: In Proceedings of the International Parallel and Distributed Processing Symposium. (2002)
[7]
Whaley, R., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), IEEE Computer Society (1998) 1-27
[8]
Bilmes, J., Asanovic, K., Chin, C., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th international conference on Supercomputing, ACM Press (1997) 340-347
[9]
Lam, M., Rothberg, E., Wolfe, M.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the fourth international conference on architectural support for programming languages and operating system. (1991) 63-74
[10]
Jonsson, I., Kagström, B.: Recursive blocked algorithms for solving triangular systems part i: one-sided and coupled sylvester-type matrix equations. ACM Trans. Math. Softw. 28 (2002) 392-415
[11]
Szymanski, B.: Parallel functional languages and compilers. ACM Press (1991)
[12]
Frigo, M., Leiserson, C., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Society (1999) 285
[13]
Toledo, S.: Locality of reference in lu decomposition with partial pivoting. SIAM Journal on Matrix Analysis and Applications 18 (1997) 1065-1081
[14]
Bilardi, G., D'Alberto, P., Nicolau, A.: Fractal matrix multiplication: a case study on portability of cache performance. In: Workshop on Algorithm Engineering 2001, Aarhus, Denmark (2001)
[15]
Gustavson, F., Henriksson, A., Jonsson, I., Ling, P., Kagström, B.: Recursive blocked data formats and BLAS's for dense linear algebra algorithms. In Verlag, S., ed.: PARA'98 Proceedings. Lecture Notes in Computing Science. Number 1541 (1998) 195-206
[16]
Frigo, M., Johnson, S.: The fastest fourier transform in the west. Technical Report MIT-LCS-TR-728, Massachusetts Institute of technology (1997)
[17]
D'Alberto, P., A.Nicolau, Veidenbaum, A.: A data cache with dynamic mapping. In Rauchwerger, L., ed.: Languages and Compilers for Parallel Computing. Volume 2958 of Lecture Notes in Computer Science., Springer Verlag (2003)
[18]
Hummel, J., Hendren, L., Nicolau, A.: Abstract description of pointer data structures: an approach for improving the analysis and optimization of imperative programs. ACM Lett. Program. Lang. Syst. 1 (1992) 243-260
[19]
Rugina, R., Rinard, M.: Automatic parallelization of divide and conquer algorithms. In: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, ACM Press (1999) 72-83
[20]
D'Alberto: (JuliusC) http://halps.ics.uci.edu/ paolo/JuliusC.
[21]
Albert, E., Hanus, M., Vidal, G.: Using an Abstract Representation to Specialize Functional Logic Programs. In: Proc. of 7th International Conference on Logic for Programming and Automated Reasoning, LPAR'2000, Springer LNAI 1955 (2000) 381-398
[22]
Gomard, C.: A self-applicable partial evaluator for the lambda calculus: correctness and pragmatics. ACM Trans. Program. Lang. Syst. 14 (1992) 147-172
[23]
Jones, N., Gomard, C., Sestoft, P.: Partial Evaluation and Automatic Program Generation. Soft edn. Prentice Hall International (1993)
[24]
Knoop, J., Rüthing, O., Steffen, B.: Partial dead code elimination. In: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, ACM Press (1994) 147-158
[25]
Pugh, W., Teitelbaum, T.: Incremental computation via function caching. In: Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ACM Press (1989) 315-328
[26]
Pugh, W.: An improved replacement strategy for function caching. In: Proceedings of the 1988 ACM conference on LISP and functional programming, ACM Press (1988) 269-276
[27]
Heydon, A., Levin, R., Yu, Y.: Caching function calls using precise dependencies. In: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, ACM Press (2000) 311-320
[28]
Abadi, M., Lampson, B., Lévy, J.: Analysis and caching of dependencies. In: Proceedings of the first ACM SIGPLAN international conference on Functional programming, ACM Press (1996) 83-91
[29]
Liu, Y., Stoller, S.: Dynamic programming via static incrementalization. Higher Order Symbol. Comput. 16 (2003) 37-62
[30]
Liu, Y., Stoller, S.: From recursion to iteration: What are the optimizations? In: Partial Evaluation and Semantic-Based Program Manipulation. (2000) 73-82
[31]
Yi, Q., Adve, V., Kennedy, K.: Transforming loops to recursion for multi-level memory hierarchies. In: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, ACM Press (2000) 169-181
[32]
Lam, M.: SUIF (1994-current) http://suif.stanford.edu/.
[33]
D'Alberto, P.: Performance evaluation of data locality exploitation. Technical report, University of Bologna, Computer Science (2000)
[34]
Lenstra, A.: The development of the number field sieve. Volume 1554 of Lecture Notes in Math., Springer-Verlag (1993)
[35]
Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press (1990)
[36]
Floyd, R.: Algorithm 97: Shortest path. Communications of the ACM 5 (1962)
[37]
Ullman, J., Yannakakis, M.: The input/output complexity of transitive closure. In: Proceedings of the 1990 ACM SIGMOD international conference on Management of data. Volume 19. (1990)
  1. JuliusC: a practical approach for the analysis of divide-and-conquer algorithms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    LCPC'04: Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
    September 2004
    484 pages
    ISBN:354028009X
    • Editors:
    • Rudolf Eigenmann,
    • Zhiyuan Li,
    • Samuel P. Midkiff

    Sponsors

    • International Business Machines Corporation: International Business Machines Corporation
    • National Science Foundation, USA

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 22 September 2004

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media