Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2019
Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsPages 5–18https://doi.org/10.1145/3316482.3326351Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on ...
- research-articleJune 2019
Stream-based memory access specialization for general purpose processors
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 736–749https://doi.org/10.1145/3307650.3322229Because of severe limitations in technology scaling, architects have innovated in specializing general purpose processors for computation primitives (e.g. vector instructions, loop accelerators). The general principle is exposing rich semantics to the ...
- extended-abstractJune 2019
Proactive Caching for Low Access-Delay Services under Uncertain Predictions
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsPages 89–90https://doi.org/10.1145/3309697.3331471Network traffic for delay-sensitive services has become a dominant part in the network. Proactive caching with the aid of predictive information has been proposed as a promising method to enhance delay performance. In this paper, we analytically ...
- research-articleJune 2019
Towards a Smart, Internet-Scale Cache Service for Data Intensive Scientific Applications
ScienceCloud '19: Proceedings of the 10th Workshop on Scientific Cloud ComputingPages 11–18https://doi.org/10.1145/3322795.3331464Data and services provided by shared facilities, such as large-scale observing facilities, have become important enablers of scientific insights and discoveries across many science and engineering disciplines. Ensuring satisfactory quality of service ...
- research-articleMarch 2019
Proactive Caching for Low Access-Delay Services under Uncertain Predictions
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 3, Issue 1Article No.: 2, Pages 1–46https://doi.org/10.1145/3322205.3311073Network traffic of delay-sensitive services has become a dominant part in the network. Proactive caching with the aid of predictive information has been proposed as a promising method to enhance the delay performance, which is one of the principal ...
- research-articleJanuary 2019
Making content caching policies 'smart' using the deepcache framework
ACM SIGCOMM Computer Communication Review (SIGCOMM-CCR), Volume 48, Issue 5Pages 64–69https://doi.org/10.1145/3310165.3310174In this paper, we present Deepcache a novel Framework for content caching, which can significantly boost cache performance. Our Framework is based on powerful deep recurrent neural network models. It comprises of two main components: i) Object ...
- research-articleJanuary 2019
Performance evaluation of main-memory hash joins on KNL
International Journal of Computational Science and Engineering (IJCSE), Volume 20, Issue 4Pages 425–438https://doi.org/10.1504/ijcse.2019.104443New hardware features have propelled designs and analysis in main-memory hash joins. In previous studies, memory access has always been the primary bottleneck for hash join algorithms. However, there are relatively few studies devoted to bottlenecks ...
- research-articleDecember 2018
AVPP: Address-first Value-next Predictor with Value Prefetching for Improving the Efficiency of Load Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO), Volume 15, Issue 4Article No.: 49, Pages 1–30https://doi.org/10.1145/3239567Value prediction improves instruction level parallelism in superscalar processors by breaking true data dependencies. Although this technique can significantly improve overall performance, most of the state-of-the-art value prediction approaches require ...
- research-articleOctober 2018
Exploiting locality in graph analytics through hardware-accelerated traversal scheduling
MICRO-51: Proceedings of the 51st Annual IEEE/ACM International Symposium on MicroarchitecturePages 1–14https://doi.org/10.1109/MICRO.2018.00010Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches are of little help because the irregular structure of graphs causes seemingly random memory references. However, most real-world graphs offer significant potential ...
- posterOctober 2018
SSD QoS Improvements through Machine Learning
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingPage 511https://doi.org/10.1145/3267809.3275453The recent deceleration of Moore's law bespeaks new approaches for optimization of resources. Machine learning has been applied to a wide variety of problems across multiple domains; however, the space of machine learning research for storage ...
- research-articleSeptember 2018
Empirically assessing opportunities for prefetching and caching in mobile apps
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software EngineeringPages 554–564https://doi.org/10.1145/3238147.3238215Network latency in mobile software has a large impact on user experience, with potentially severe economic consequences. Prefetching and caching have been shown effective in reducing the latencies in browser-based systems. However, those techniques ...
- research-articleAugust 2018
LAPPS: Locality-Aware Productive Prefetching Support for PGAS
ACM Transactions on Architecture and Code Optimization (TACO), Volume 15, Issue 3Article No.: 28, Pages 1–26https://doi.org/10.1145/3233299Prefetching is a well-known technique to mitigate scalability challenges in the Partitioned Global Address Space (PGAS) model. It has been studied as either an automated compiler optimization or a manual programmer optimization. Using the PGAS locality ...
- research-articleAugust 2018
DeepCache: A Deep Learning Based Framework For Content Caching
NetAI'18: Proceedings of the 2018 Workshop on Network Meets AI & MLPages 48–53https://doi.org/10.1145/3229543.3229555In this paper, we present DEEPCACHE a novel Framework for content caching, which can significantly boost cache performance. Our Framework is based on powerful deep recurrent neural network models. It comprises of two main components: i) Object ...
- demonstrationJune 2018
Low-latency delivery of news-based video content
MMSys '18: Proceedings of the 9th ACM Multimedia Systems ConferencePages 537–540https://doi.org/10.1145/3204949.3208110Nowadays, news-based websites and portals provide significant amounts of multimedia content to accompany news stories and articles. Within this context, HTTP Adaptive Streaming is generally used to deliver video over the best-effort Internet, allowing ...
- research-articleJune 2018
Rethinking belady's algorithm to accommodate prefetching
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitecturePages 110–123https://doi.org/10.1109/ISCA.2018.00020This paper shows that in the presence of data prefetchers, cache replacement policies are faced with a large unexplored design space. In particular, we observe that while Belady's MIN algorithm minimizes the total number of cache misses---including ...
- research-articleJune 2018
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitecturePages 96–109https://doi.org/10.1109/ISCA.2018.00019On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
- research-articleJune 2018
Division of labor: a more effective approach to prefetching
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitecturePages 83–95https://doi.org/10.1109/ISCA.2018.00018Prefetching is a central component in most microarchitectures. Many different algorithms have been proposed with varying degrees of complexity and effectiveness. There are inherent tradeoffs among various metrics especially when we try to exploit both ...
- research-articleMay 2018
Improving energy efficiency of database clusters through prefetching and caching
CCGrid '18: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid ComputingPages 388–391https://doi.org/10.1109/CCGRID.2018.00065The goal of this study is to optimize energy efficiency of database clusters through prefetching and caching strategies. We design a workload-skewness scheme to collectively manage a set of hot and cold nodes in a database cluster system. The prefetching ...
- research-articleMarch 2018
Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsPages 593–607https://doi.org/10.1145/3173162.3173197The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low ...
Also Published in:
ACM SIGPLAN Notices: Volume 53 Issue 2 - research-articleMarch 2018
An Event-Triggered Programmable Prefetcher for Irregular Workloads
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsPages 578–592https://doi.org/10.1145/3173162.3173189Many modern workloads compute on large amounts of data, often with irregular memory accesses. Current architectures perform poorly for these workloads, as existing prefetching techniques cannot capture the memory access patterns; these applications end ...
Also Published in:
ACM SIGPLAN Notices: Volume 53 Issue 2