Document Open Access Logo

When Are Cache-Oblivious Algorithms Cache Adaptive? A Case Study of Matrix Multiplication and Sorting

Authors Arghya Bhattacharya, Abiyaz Chowdhury, Helen Xu, Rathish Das, Rezaul A. Chowdhury, Rob Johnson, Rishab Nithyanand, Michael A. Bender



PDF
Thumbnail PDF

File

LIPIcs.ESA.2022.16.pdf
  • Filesize: 0.61 MB
  • 17 pages

Document Identifiers

Author Details

Arghya Bhattacharya
  • Stony Brook University, NY, USA
Abiyaz Chowdhury
  • Stony Brook University, NY, USA
Helen Xu
  • Lawrence Berkeley National Laboratory, CA, USA
Rathish Das
  • University of Waterloo, Canada
Rezaul A. Chowdhury
  • Stony Brook University, NY, USA
Rob Johnson
  • VMware Research, Palo Alto, CA, USA
Rishab Nithyanand
  • The University of Iowa, Iowa City, IA, USA
Michael A. Bender
  • Stony Brook University, NY, USA

Cite AsGet BibTex

Arghya Bhattacharya, Abiyaz Chowdhury, Helen Xu, Rathish Das, Rezaul A. Chowdhury, Rob Johnson, Rishab Nithyanand, and Michael A. Bender. When Are Cache-Oblivious Algorithms Cache Adaptive? A Case Study of Matrix Multiplication and Sorting. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ESA.2022.16

Abstract

Cache-adaptive algorithms are a class of algorithms that achieve optimal utilization of dynamically changing memory. These memory fluctuations are the norm in today’s multi-threaded shared-memory machines and time-sharing caches. Bender et al. [Bender et al., 2014] proved that many cache-oblivious algorithms are optimally cache-adaptive, but that some cache-oblivious algorithms can be relatively far from optimally cache-adaptive on worst-case memory fluctuations. This worst-case gap between cache obliviousness and cache adaptivity depends on a highly-structured, adversarial memory profile. Existing cache-adaptive analysis does not predict the relative performance of cache-oblivious and cache-adaptive algorithms on non-adversarial profiles. Does the worst-case gap appear in practice, or is it an artifact of an unrealistically powerful adversary? This paper sheds light on the question of whether cache-oblivious algorithms can effectively adapt to realistically fluctuating memory sizes; the paper focuses on matrix multiplication and sorting. The two matrix-multiplication algorithms in this paper are canonical examples of "(a, b, c)-regular" cache-oblivious algorithms, which underlie much of the existing theory on cache-adaptivity. Both algorithms have the same asymptotic I/O performance when the memory size remains fixed, but one is optimally cache-adaptive, and the other is not. In our experiments, we generate both adversarial and non-adversarial memory workloads. The performance gap between the algorithms for matrix multiplication grows with problem size (up to 3.8×) on the adversarial profiles, but the gap does not grow with problem size (stays at 2×) on non-adversarial profiles. The sorting algorithms in this paper are not "(a, b, c)-regular," but they have been well-studied in the classical external-memory model when the memory size does not fluctuate. The relative performance of a non-oblivious (cache-aware) sorting algorithm degrades with the problem size: it incurs up to 6 × the number of disk I/Os compared to an oblivious adaptive algorithm on both adversarial and non-adversarial profiles. To summarize, in all our experiments, the cache-oblivious matrix-multiplication and sorting algorithms that we tested empirically adapt well to memory fluctuations. We conjecture that cache-obliviousness will empirically help achieve adaptivity for other problems with similar structures.

Subject Classification

ACM Subject Classification
  • Theory of computation → Shared memory algorithms
Keywords
  • Cache-adaptive algorithms
  • cache-oblivious algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal and Jeffrey S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116-1127, September 1988. Google Scholar
  2. Rakesh D. Barve. Algorithmic Techniques To Overcome The I/O Bottleneck. PhD thesis, Duke University, 1998. Google Scholar
  3. Rakesh D. Barve and Jeffrey Scott Vitter. A theoretical framework for memory-adaptive algorithms. In Proc. 40th Annual Symposium on the Foundations of Computer Science (FOCS), pages 273-284, New York City, NY, 1999. IEEE. Google Scholar
  4. Michael A. Bender, Gerth Stølting Brodal, Rolf Fagerberg, Dongdong Ge, Simai He, Haodong Hu, John Iacono, and Alejandro López-Ortiz. The cost of cache-oblivious searching. In Proceedings of the 44th Annual Symposium on Foundations of Computer Science (FOCS), pages 271-280, Cambridge, MA, 2003. IEEE. Google Scholar
  5. Michael A. Bender, Gerth Stølting Brodal, Rolf Fagerberg, Dongdong Ge, Simai He, Haodong Hu, John Iacono, and Alejandro López-Ortiz. The cost of cache-oblivious searching. Algorithmica, 61(2):463-505, 2011. Google Scholar
  6. Michael A. Bender, Rezaul A. Chowdhury, Rathish Das, Rob Johnson, William Kuszmaul, Andrea Lincoln, Quanquan C Liu, Jayson Lynch, and Helen Xu. Closing the gap between cache-oblivious and cache-adaptive analysis. In Proc. 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 63-73, virtual event, USA, 2020. ACM. Google Scholar
  7. Michael A. Bender, Erik D. Demaine, Roozbeh Ebrahimi, Jeremy T. Fineman, Rob Johnson, Andrea Lincoln, Jayson Lynch, and Samuel McCauley. Cache-adaptive analysis. In Proc. 28rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 135-144, Asilomar State Beach, CA, 2016. ACM. Google Scholar
  8. Michael A. Bender, Roozbeh Ebrahimi, Jeremy T. Fineman, Golnaz Ghasemiesfeh, Rob Johnson, and Samuel McCauley. Cache-adaptive algorithms. In Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 958-971, Portland, Oregon, 2014. ACM-SIAM. Google Scholar
  9. Andreas Björklund, Rasmus Pagh, Virginia Vassilevska Williams, and Uri Zwick. Listing triangles. In Proc. of 41st International Colloquium on Automata, Languages, and Programming, pages 223-234, Copenhagen, Denmark, 2014. Springer. Google Scholar
  10. Gerth Stølting Brodal and Rolf Fagerberg. Cache oblivious distribution sweeping. In Proc. 29th International Colloquium on Automata, Languages and Programming (ICALP), pages 426-438, Malaga, Spain, 2002. Springer-Verlag. Google Scholar
  11. Gerth Stølting Brodal, Rolf Fagerberg, and Kristoffer Vinther. Engineering a cache-oblivious sorting algorithm. ACM Journal of Experimental Algorithmics, 12:1-23, 2007. Google Scholar
  12. Kurt P Brown, Michael James Carey, and Miron Livny. Managing memory to meet multiclass workload response time goals. In Proc. of the 19th International Conference on Very Large Data Bases (VLDB), pages 328-328, Dublin, Ireland, 1993. IEEE. Google Scholar
  13. Rezaul A. Chowdhury, Pramod Ganapathi, Jesmin Jahan Tithi, Yuan Tang, Charles A. Bachmeier, Bradley C Kuszmaul, Charles E. Leiserson, and Armando Solar Lezama. Autogen: Automatic discovery of efficient recursive divide-&-conquer algorithms for solving dynamic programming problems. In Proc. of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 1-12, Barcelona, Spain, 2016. ACM. Google Scholar
  14. Rezaul A. Chowdhury, Pramod Ganapathi, Stephen Tschudi, Jesmin Jahan Tithi, Charles Bachmeier, Charles E. Leiserson, Armando Solar-Lezama, Bradley C. Kuszmaul, and Yuan Tang. Autogen: Automatic discovery of efficient recursive divide-&-conquer algorithms for solving dynamic programming problems. ACM Transactions on Parallel Computing, 4(1):1-12, 2017. Google Scholar
  15. Rezaul Alam Chowdhury and Vijaya Ramachandran. The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation. Theory of Computing Systems, 47(4):878-919, 2010. Google Scholar
  16. T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction To Algorithms. MIT Press, Cambridge, Massachusetts, 2001. Google Scholar
  17. Transaction Processing Performance Council. Tpc-c benchmark. Technical Report 5.10.1, TPC, 2009. URL: http://www.tpc.org/tpcc/.
  18. Peter J. Denning. Thrashing: its causes and prevention. In Proceedings of the December 9-11, 1968, fall joint computer conference, part I, AFIPS '68, pages 915-922, San Francisco, CA, 1968. AFIPS. Google Scholar
  19. Peter J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, 6(1):64-84, 1980. Google Scholar
  20. Dave Dice, Virendra J. Marathe, and Nir Shavit. Brief announcement: Persistent unfairness arising from cache residency imbalance. In Proc. of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 82-83, Prague, Czech Republic, 2014. ACM. Google Scholar
  21. Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In Proc. of the 40th Annual Symposium on the Foundations of Computer Science (FOCS), pages 285-298, New York City, NY, 1999. IEEE. Google Scholar
  22. Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. ACM Transactions on Algorithms, 8(1):4, 2012. Google Scholar
  23. Masaru Kitsuregawa, Hidehiko Tanaka, and Tohru Moto-Oka. Application of hash to data base machine and its architecture. New Generation Computing, 1:63-74, 1983. Google Scholar
  24. Richard T Mills, Andreas Stathopoulos, and Dimitrios S Nikolopoulos. Adapting to memory pressure from within scientific applications on multiprogrammed cows. In Proc. 8th International Parallel and Distributed Processing Symposium (IPDPS), pages 71-88, Santa Fe, New Mexico, 2004. IEEE. Google Scholar
  25. Richard T Mills, Chuan Yue, Andreas Stathopoulos, and Dimitrios S Nikolopoulos. Runtime and programming support for memory adaptation in scientific applications via local disk and remote memory. Journal of Grid Computing, 5:213-234, 2007. Google Scholar
  26. Richard Tran Mills. Dynamic adaptation to CPU and memory load in scientific applications. PhD thesis, The College of William and Mary, 2004. Google Scholar
  27. Jesper Holm Olsen, Søren Skov, and Frederik Rønn. Cpp implementations of a cache-oblivious sorting method. Private communication, June 2003. Google Scholar
  28. J. Ousterhout. Scheduling techniques for concurrent systems. In Proc. of the 3rd IEEE International Conference on Distributed Computing Systems (ICDCS), pages 22-30, Miami, Florida, 1982. IEEE. Google Scholar
  29. HweeHwa Pang, Michael J. Carey, and Miron Livny. Memory-adaptive external sorting. In Proc. 19th International Conference on Very Large Data Bases (VLDB), pages 618-629, Dublin, Ireland, 1993. Morgan Kaufmann. Google Scholar
  30. HweeHwa Pang, Michael J Carey, and Miron Livny. Partially preemptible hash joins. In Proc. 5th ACM SIGMOD International Conference on Management of Data (COMAD), pages 59-68, Washington, DC, 1993. ACM. Google Scholar
  31. H. Prokop. Cache oblivious algorithms. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1999. Google Scholar
  32. Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis: An attempt to explain the behavior of algorithms in practice. Communications of the ACM, 52(10):76-84, 2009. Google Scholar
  33. Virginia Vassilevska Williams and Ryan Williams. Subcubic equivalences between path, matrix and triangle problems. In Proc. of the 51st Annual Symposium on Foundations of Computer Science (FOCS), pages 645-654, Las Vegas, NV, USA, 2010. IEEE. Google Scholar
  34. Weiye Zhang and Per-Äke Larson. A memory-adaptive sort (MASORT) for database systems. In Proc. 6th International Conference of the Centre for Advanced Studies on Collaborative research (CASCON), pages 41-54, Toronto, Ontario, Canada, 1996. IBM Press. Google Scholar
  35. Weiye Zhang and Per-Äke Larson. Dynamic memory adjustment for external mergesort. In Proc. 23rd International Conference on Very Large Data Bases (VLDB), pages 376-385, Athens, Greece, 1997. Morgan Kaufmann. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail