Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2013
Evaluator-executor transformation for efficient pipelining of loops with conditionals
ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 4Article No.: 62, Pages 1–23https://doi.org/10.1145/2541228.2555317Control divergence poses many problems in parallelizing loops. While predicated execution is commonly used to convert control dependence into data dependence, it often incurs high overhead because it allocates resources equally for both branches of a ...
- research-articleJune 2013
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer ArchitecturePages 356–367https://doi.org/10.1145/2485922.2485953Current GPUs maintain high programmability by abstracting the SIMD nature of the hardware as independent concurrent threads of control with hardware responsible for generating predicate masks to utilize the SIMD hardware for different flows of control. ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 41 Issue 3 - posterSeptember 2012
Branch and data herding: reducing control and memory divergence for error-tolerant GPU applications
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 427–428https://doi.org/10.1145/2370816.2370879Control and memory divergence between threads in the same execution bundle, or warp, can significantly throttle the performance of GPU applications. We exploit the observation that many GPU applications exhibit error tolerance to propose branch and data ...