skip to main content
10.1145/2380403.2380440acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Integrating software caches with scratch pad memory

Published: 07 October 2012 Publication History

Abstract

Software cache refers to cache functionality emulated in software on a compiler-controlled Scratch Pad Memory (SPM). Such structures are useful when standard SPM allocation strategies cannot be used due to hard-to-analyze memory reference patterns in the source code. SPM data allocation strategies generally rely on compile-time inference of spatial and temporal reuse, with the general flow being the copying of a block/tile of array data into the SPM, followed by its processing, and finally, copying back. However, when array index functions are complicated due to conditionals, complex expressions, and dependence on run-time data, the SPM compiler has to rely on expensive DMA for individual words, leading to poor performance. Software caches (SWC) can play a crucial role in improving performance under such circumstances -- their access times are longer than those for direct SPM access, but they retain the advantages (present in hardware caches) of exploiting spatial and temporal locality discovered at run-time. We present the first automated compiler data allocation strategy that considers the presence of a software cache in SPM space, and makes decisions on which arrays should be accessed through it, at which times. Arrays could be accessed differently in different parts of a program, and our algorithm analyzes such uses and considers the possibility of selectively accessing an array through the SWC only when it is efficient, based on a cost model of the overheads involved in SPM/SWC transitions. We implemented our technique in an LLVM based framework and experimented with several applications on a Cell based machine. Our technique results in up to 82% overall performance improvement over a conventional SPM mapping algorithm and up to 27% over a typical SWC-enhanced implementation.

References

[1]
M. J. Absar and F. Catthoor. Compiler-based approach for exploiting scratch-pad in presence of irregular array access. In DATE, 2005.
[2]
J. Balart, M. Gonzalez, X. Martorell, and et al. A novel asynchronous software cache implementation for the Cell-BE processor. In LCPC, 2007.
[3]
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel. Scratch-pad memory: A design alternative for cache on-chip memory in embedded systems. In CODES, 2002.
[4]
D. Brash. The ARM architecture version 6 (ARMv6). Technical report, January 2002.
[5]
M. C. Carlisle and A. Rogers. Software caching and computation migration in olden. Parallel and Distributed Computing, 38(2):248--255, 1996.
[6]
W. Che and K. S. Chatha. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming. In DAC, 2011.
[7]
T. Chen, H. Lin, T. Zhang, and et al. Orchestrating data transfer for the Cell/B.E. processor. In ICS, 2008.
[8]
T. Chen, Z. Sura, K. O'Brien, and K. O'Brien. Optimizing the use of static buffers for DMA on a CELL chip. LCPC, 2006.
[9]
T. Chen, T. Zhang, Z. Sura, M. G. Tallada, K. O'Brien, and K. O'Brien. Prefetching irregular references for software cache on Cell. In CGO, 2008.
[10]
J. Cong, H. Huang, C. Liu, and Y. Zou. A reuse-aware prefetching scheme for scratchpad memory. In DAC, 2011.
[11]
A. Dominguez, S. Udayakumaran, and R. Barua. Heap data allocation to scratch-pad memory in embedded systems. J. Embedded Computing, 1(4):521--540, 2005.
[12]
P. Francesco, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. M. Mendias. An integrated hardware/software approach for runtime scratchpad management. In DAC, pages 238--243, 2004.
[13]
M. K. G. Chen, O. Ozturk and M. Karakoy. Dynamic scratch-pad memory management for irregular array access patterns. In DATE, 2006.
[14]
M. Gonz‘alez, N. Vujic, X. Martorell, and et al. Hybrid access-specific software cache techniques for the Cell BE architecture. In PACT, 2008.
[15]
A. Iyengar. Design and performance of a general-purpose software cache. In IPCC99, 1999.
[16]
A. Janapsatya, S. Parameswaran, and A. Ignjatovic. Hardware/software managed scratchpad memory for embedded system. In ICCAD, 2004.
[17]
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell multiprocessor. Technical report, IBM, July/September 2005.
[18]
M. Kandemir, J. Ramanujam, M. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. Dynamic management of scratch-pad memory space. In DAC, June 2001.
[19]
J. Lee, S. Seo, C. Kim, and et al. Comic: a coherent shared memory interface for Cell BE. In PACT, 2008.
[20]
L. Li, L. Gao, and J. Xue. Memory coloring: A compiler approach for scratchpad memory management. In PACT, 2005.
[21]
P. Lokuciejewski, D. Cordes, H. Falk, and P. Marwedel. A fast and precise static loop analysis based on abstract interpretation, program slicing and polytope models. In CGO, 2009.
[22]
D. Lu, A. Shrivastava, and K. Bai. Vector class on limited local memory (LLM) multi-core processors. In CASES, pages 215--224, 2011.
[23]
C. A. Moritz, M. Frank, and S. Amarasinghe. Flexcache: A framework for flexible compiler generated data caching. In IMS, November 2000.
[24]
NVIDIA. NVIDIA's next generation CUDA compute architecture: FERMI. Technical report.
[25]
P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, A. V. C. Kulkarni, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems. TODAES, 6(2):149--206, 2001.
[26]
P. R. Panda, N. D. Dutt, and A. Nicolau. Efficient utilization of scratch-pad memory in embedded processor applications. In EDTC, pages 7--11, 1997.
[27]
S. Seo, J. Lee, and Z. Sura. Design and implementation of software-managed caches for multicores with local memory. In HPCA, 2009.
[28]
V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. WCET centric data allocation to scratchpad memory. In RTSS, 2005.
[29]
L. Tao, L. Haibo, and et al. Dbdb: Optimizing dma transfer for the Cell BE architecture. In ICS, 2009.
[30]
S. Udayakumaran and R. Barua. An integrated scratch-pad allocator for affine and non-affine code. In DATE, 2006.
[31]
UIUC. The LLVM Reference Manual (Version 2.6).
[32]
X. Vera and J. Xue. Let's study whole-program cache behaviour analytically. In HPCA, 2002.
[33]
M. Verma, S. Steinke, and P. Marwedel. Data partitioning for maximal scratchpad usage. In ASP-DAC, 2003.
[34]
T. Yemliha, S. Srikantaiah, M. Kandemir, and O. Ozturk. SPM management using markov chain based data access prediction. In ICCAD, 2008.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
October 2012
230 pages
ISBN:9781450314244
DOI:10.1145/2380403
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dma
  2. memory allocation
  3. scratch pad memory
  4. software cache

Qualifiers

  • Research-article

Conference

ESWEEK'12
ESWEEK'12: Eighth Embedded System Week
October 7 - 12, 2012
Tampere, Finland

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media