skip to main content
research-article
Open access

Using machine learning to partition streaming programs

Published: 16 September 2013 Publication History

Abstract

Stream-based parallel languages are a popular way to express parallelism in modern applications. The efficient mapping of streaming parallelism to today's multicore systems is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned offline. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90× speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77× performance improvement. By porting our approach to an 8-core platform, we are able to obtain 1.8× improvement over the StreamIt default scheme, demonstrating the portability of our approach.

References

[1]
Aleen, F., Sharif, M., and Pande, S. 2010. Input-driven dynamic execution prediction of streaming applications. In Proceedings of the 15th ACM-SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10). 315--324.
[2]
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., and Yelick, K. 2009. A view of the parallel computing landscape. Comm. ACM 52, 56--67.
[3]
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer.
[4]
Bui, T. N. and Jones, C. 1992. Finding good approximate vertex and edge partitions is np-hard. Inf. Process. Lett. 42, 3.
[5]
Culler, D. E., Karp, R. M., Patterson, D., Sahay, A., Santos, E. E., et al. 1996. LogP: A practical model of parallel computation. Comm. ACM 39, 11.
[6]
Dai, J., Huang, B., Li, L., and Harrison, L. 2005. Automatically partitioning packet processing applications for pipelined architectures. In Proceedings of the ACM-SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05). 237--248.
[7]
Duda, R. O., Hart, P. E., and Stork, D. G. 2000. Pattern Classification, 2nd Ed. Wiley-Interscience.
[8]
Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). 151--162.
[9]
Gordon, M. I., Thies, W., Karczmarek, M., Lin, J., Meli, A. S., Lamb, A. A., Leger, C., Wong, J., Hoffmann, H., Maze, D., and Amarasinghe, S. 2002. A stream compiler for communication exposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'02). 291--303.
[10]
Grewe, D., Wang, Z., and O'Boyle, M. F. P. 2011. A workload-aware mapping approach for data-parallel programs. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC'11). 117--126.
[11]
Hormati, A. H., Choi, Y., Kudlur, M., Rabbah, R., Mudge, T., and Mahlke, S. 2009. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. 214--223.
[12]
Hoste, K. and Eeckhout, L. 2008. Cole: Compiler optimization level exploration. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). 165--174.
[13]
Kudlur, M. and Mahlke, S. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). 114--124.
[14]
Kwok, Y. and Ahmad, I. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31, 4.
[15]
Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. Proc. IEEE 75, 9.
[16]
Liao, S.-W., Du, Z., Wu, G., and Lueh, G.-Y. 2006. Data and computation transformations for brook streaming applications on multiprocessors. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'06). 196--207.
[17]
Luk, C.-K., Hong, S., and Kim, H. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 45--55.
[18]
Moss, E., Utgoff, P., Cavazos, J., Brodley, C., Scheeff, D., Precup, D., and Stefanovic, D. 1998. Learning to schedule straight-line code. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'97). 929--935.
[19]
Navarro, A., Asenjo, R., Tabik, S., and Cascaval, C. 2009. Analytical modeling of pipeline parallelism. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. 281--290.
[20]
Pelleg, D. and Moore, A. W. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning (ICML'00). 727--734.
[21]
Pouchet, L.-N., Bastoul, C., Cohen, A., and Cavazos, J. 2008. Iterative optimization in the polyhedral model: Part ii, multidimensional time. In Proceedings of the ACM-SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). 90--100.
[22]
Ramamritham, K. and Stankovic, J. A. 1984. Dynamic task scheduling in hard real-time distributed systems. IEEE Softw. 1, 3.
[23]
Sarkar, V. 1991. Automatic partitioning of a program dependence graph into parallel tasks. IBM J. Res. Devel. 35, 5--6.
[24]
Schwarz, G. 1978. Estimating the dimension of a model. Ann. Statist. 6, 2.
[25]
Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'02). 45--57.
[26]
Stephens, R. 1997. A survey of stream processing. Acta Informatica 34, 7, 491--541.
[27]
Stephenson, M. and Amarasinghe, S. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'05). 123--134.
[28]
Stephenson, M., Amarasinghe, S., Martin, M., and O'Reilly, U.-M. 2003. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'03). 77--90.
[29]
Thies, B., Karczmarek, M., and Amarasinghe, S. 2001. Streamit: A language for streaming applications. In Proceedings of the International Conference on Compiler Construction (CC'01).
[30]
Thies, W. 2009. Language and compiler support for stream programs. Ph.D. thesis. Massachusetts Institute of Technology, Cambridge, MA. http://groups.csail.mit.edu/commit/papers/09/thies-phd-thesis.pdf.
[31]
Thies, W. and Amarasinghe, S. 2010. An empirical characterization of stream programs and its implications for language and compiler design. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10).
[32]
Tournavitis, G., Wang, Z., Franke, B., and O'Boyle, M. F. 2009. Towards a holistic approach to auto parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of the ACM-SIGPLAN Conference on Programming Language Design and Implementation (PLDI'09). 177--187.
[33]
Udupa, A., Govindarajan, R., and Thazhuthaveetil, M. J. 2009. Software pipelined execution of stream programs on gpus. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'09). 200--209.
[34]
Wang, Z. and O'Boyle, M. F. 2009. Mapping parallelism to multi-cores: A machine learning based approach. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'09). 75--84.
[35]
Wang, Z. and O'Boyle, M. F. 2010. Partitioning streaming parallelism for multi-cores: A machine learning based approach. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 307--318.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 10, Issue 3
September 2013
310 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2509420
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2013
Accepted: 01 January 2013
Revised: 01 September 2012
Received: 01 September 2011
Published in TACO Volume 10, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compiler optimization
  2. machine learning
  3. multicore
  4. partitioning streaming parallelism

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)11
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media