Dark silicon and the end of multicore scaling

H Esmaeilzadeh, E Blem, R St. Amant… - Proceedings of the 38th …, 2011 - dl.acm.org
Proceedings of the 38th annual international symposium on Computer architecture, 2011dl.acm.org
Since 2005, processor designers have increased core counts to exploit Moore's Law
scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to
which the shift to multicore parts is partially a response, may soon limit multicore scaling just
as single-core scaling has been curtailed. This paper models multicore scaling limits by
combining device scaling, single-core scaling, and multicore scaling to measure the
speedup potential for a set of parallel workloads for the next five technology generations. For …
Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9x average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.
ACM Digital Library