research-article

DSAGEN: synthesizing programmable spatial accelerators

Authors:

Zhengrong Wang,

Tony NowatzkiAuthors Info & Claims

ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture

Pages 268 - 281

https://doi.org/10.1109/ISCA45697.2020.00032

Published: 23 September 2020 Publication History

Abstract

Domain-specific hardware accelerators can provide orders of magnitude speedup and energy efficiency over general purpose processors. However, they require extensive manual effort in hardware design and software stack development. Automated ASIC generation (eg. HLS) can be insufficient, because the hardware becomes inflexible. An ideal accelerator generation framework would be automatable, enable deep specialization to the domain, and maintain a uniform programming interface.

Our insight is that many prior accelerator architectures can be approximated by composing a small number of hardware primitives, specifically those from spatial architectures. With careful design, a compiler can understand how to use available primitives, with modular and composable transformations, to take advantage of the features of a given program. This suggests a paradigm where accelerators can be generated by searching within such a rich accelerator design space, guided by the affinity of input programs for hardware primitives and their interactions.

We use this approach to develop the DSAGEN framework, which automates the hardware/software co-design process for reconfigurable accelerators. For several existing accelerators, our evaluation demonstrates that the compiler can achieve 89% of the performance of manually tuned versions. For automated design space exploration, we target multiple sets of workloads which prior accelerators are design for; the generated hardware has mean 1.3 x perf²/mm² over prior programmable accelerators.

References

[1]

B. Akin, Z. A. Chishti, and A. R. Alameldeen, "ZCOMP: reducing DNN cross-layer memory footprint using vector extensions," in MICRO, 2019.

Digital Library

[2]

M. Annaratone, E. A. Arnould, T. Gross, H. T. Kung, M. S. Lam, O. Menzilcioglu, and J. A. Webb, "The Warp Computer: Architecture, Implementation, and Performance," IEEE Transactions on Computers, 1987.

Digital Library

[3]

G. Ansaloni, P. Bonzini, and L. Pozzi, "EGRA: a coarse grained reconfigurable architectural template," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 6, pp. 1062--1074, 2010.

Digital Library

[4]

M. Auguin, F. Boeri, and E. Carriere, "Automatic exploration of vliw processor architectures from a designer's experience based specification," in Third International Workshop on Hardware/Software Codesign, Sep 1994.

[5]

J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović, "Chisel: Constructing hardware in a Scala embedded language," in 49th DAC, 2012.

Digital Library

[6]

E. Baek, H. Lee, Y. Kim, and J. Kim, "FlexLearn: fast and highly efficient brain simulations using flexible on-chip learning," in 52nd MICRO, 2019.

Digital Library

[7]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti et al., "The gem5 simulator," SIGARCH Comput. Archit. News, 2011.

Digital Library

[8]

F. Bouwens, M. Berekovic, A. Kanstein, and G. Gaydadjiev, "Architectural exploration of the ADRES coarse-grained reconfigurable array," in ARC 2007.

[9]

D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder et al., "Scaling to the end of silicon with EDGE architectures," Computer, 2004.

[10]

A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. H. Anderson, S. Brown, and T. Czajkowski, "Legup: high-level synthesis for fpga-based processor/accelerator systems," in 19th FPGA, 2011.

Digital Library

[11]

T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze et al., "Tvm: An automated end-to-end optimizing compiler for deep learning," in 13th OSDI, 2018.

[12]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning," in 19th ASPLOS. ACM, 2014.

Digital Library

[13]

S. A. Chin, N. Sakamoto, A. Rui, J. Zhao, J. H. Kim, Y. Hara-Azumi, and J. Anderson, "CGRA-ME: a unified framework for cgra modelling and exploration," in 28th ASAP, July 2017.

[14]

S. A. Chin and J. H. Anderson, "An architecture-agnostic integer linear programming approach to cgra mapping," in 55th DAC, 2018.

[15]

S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi, "The Reconfigurable Streaming Vector Processor (RSVP)," in 36th MICRO. IEEE, 2003.

[16]

N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, "Application-specific processing on a general-purpose core via transparent instruction set customization," in MICRO, 2004.

[17]

J. Cong, H. Huang, C. Ma, B. Xiao, and P. Zhou, "A fully pipelined and dynamically composable architecture of CGRA," in 22th FCCM, 2014.

[18]

T. M. Conte and W. Mangione-Smith, "Determining cost-effective multiple issue processor designs," in ICCD, Oct 1993.

[19]

T. M. Conte, K. N. P. Menezes, and S. W. Sathaye, "A technique to determine power-efficient, high-performance superscalar processors," in HICSS, 1995.

[20]

V. Daduand T. Nowatzki, "Towards general purpose acceleration by exploiting common data-dependence forms," in 52nd MICRO, 2019.

[21]

Y. Feng, P. Whatmough, and Y. Zhu, "ASV: accelerated stereo vision system," in 52nd MICRO, ser. MICRO '52. ACM, 2019.

Digital Library

[22]

J. Ferrante, K. J. Ottenstein, and J. D. Warren, "The program dependence graph and its use in optimization," ACM Trans. Program. Lang. Syst., vol. 9, no. 3, pp. 319--349, Jul. 1987.

Digital Library

[23]

J. A. Fisher, P. Faraboschi, and G. Desoli, "Custom-fit processors: Letting applications define architectures," in MICRO, 1996.

[24]

S. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. Taylor, and R. Laufer, "PipeRench: a coprocessor for streaming multimedia acceleration," in 26th ISCA, 1999.

Digital Library

[25]

A. Gondimalla, N. Chesnut, M. Thottethodi, and T. N. Vijaykumar, "SparTen: A sparse tensor accelerator for convolutional neural networks," in 52nd MICRO, 2019.

Digital Library

[26]

V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "DySER: unifying functionality and parallelism specialization for energy-efficient computing," IEEE Micro, Sep. 2012.

Digital Library

[27]

S. Gudaparthi, S. Narayanan, R. Balasubramonian, E. Giacomin, H. Kambalasubramanyam, and P. Gaillardon, "Wire-aware architecture and dataflow for CNN accelerators," in 52nd MICRO, 2019.

[28]

S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011.

[29]

A. Halambi, P. Grun, V. Ganesh, A. Khare, N. Dutt, and A. Nicolau, "Expression: a language for architecture exploration through compiler/simulator retargetability," in DATE, March 1999.

[30]

R. Hartenstein, M. Herz, T. Hoffmann, and U. Nageldinger, "Kressarray xplorer: a new cad environment to optimize reconfigurable datapath array architectures," in DAC, Jan 2000.

[31]

K. Hegde, H. A. Moghaddam, M. Pellauer, N. C. Crago, A. Jaleel, E. Solomonik, J. S. Emer, and C. W. Fletcher, "ExTensor: an accelerator for sparse tensor algebra," in 52nd MICRO, 2019.

Digital Library

[32]

B. K. Holmer and A. M. Despain, "Viewing instruction set design as an optimization problem," in 24th MICRO. ACM, 1991.

[33]

W. Hua, Y. Zhou, C. D. Sa, Z. Zhang, and G. E. Suh, "Boosting the performance of CNN accelerators with dynamic fine-grained channel gating," in 52nd MICRO, 2019.

[34]

C. Huang, Y. Ding, H. Wang, C. Weng, K. Lin, L. Wang, and L. Chen, "ecnn: A block-based and highly-parallel CNN accelerator for edge inference," in 52nd MICRO, 2019.

Digital Library

[35]

I. J. Huang and A. M. Despain, "High level synthesis of pipelined instruction set processors and back-end compilers," in DAC, Jun 1992.

[36]

W. Huangfu, X. Li, S. Li, X. Hu, P. Gu, and Y. Xie, "MEDAL: scalable DIMM based near data processing accelerator for DNA seeding algorithm," in 52nd MICRO, 2019.

Digital Library

[37]

R. M. J. Hu, "Energy-aware mapping for tile-based noc architectures under performance constraints," in ASP-DAC, Jan 2003.

[38]

J. Jang, J. Heo, Y. Lee, J. Won, S. Kim, S. Jung, H. Jang, T. J. Ham, and J. W. Lee, "Charon: Specialized near-memory processing architecture for clearing dead objects in memory," in 52nd MICRO, 2019.

Digital Library

[39]

K. Kanellopoulos, N. Vijaykumar, C. Giannoula, R. Azizi, S. Koppula, N. Mansouri-Ghiasi, T. Shahroodi, J. Gómez-Luna, and O. Mutlu, "SMASH: co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations," in 52nd MICRO, 2019.

[40]

Y. Kim, R. N. Mahapatra, and K. Choi, "Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture," IEEE transactions on very large scale integration (VLSI) systems, vol. 18, no. 10, pp. 1471--1482, 2009.

[41]

D. Koeplinger, M. Feldman, R. Prabhakar, Y. Zhang, S. Hadjis, R. Fiszel, T. Zhao, L. Nardi, A. Pedram, C. Kozyrakis et al., "Spatial: A language and compiler for application accelerators," in PLDI, 2018.

[42]

S. Koppula, L. Orosa, A. G. Yaglikçi, R. Azizi, T. Shahroodi, K. Kanellopoulos, and O. Mutlu, "EDEN: enabling energy-efficient, high-performance deep neural network inference using approximate DRAM," in 52nd MICRO, 2019.

[43]

Y. E. Krasteva, F. Criado, E. d. l. Torre, and T. Riesgo, "A fast emulation-based noc prototyping framework," in ReConFig, Dec 2008.

[44]

H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna, "Understanding reuse, performance, and hardware cost of DNN dataflow: A data-centric approach," in 52nd MICRO, 2019.

Digital Library

[45]

H. Kwon, A. Samajdar, and T. Krishna, "MAERI: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects," SIGPLAN Not., vol. 53, no. 2, pp. 461--475, Mar. 2018.

[46]

Y. Kwon, Y. Lee, and M. Rhu, "TensorDIMM: A practical near-memory processing architecture for embeddings and tensor operations in deep learning," in 52nd MICRO, 2019.

Digital Library

[47]

A. D. Lascorz, S. Sharify, I. Edo, D. M. Stuart, O. M. Awad, P. Judd, M. Mahmoud, M. Nikolic, K. Siu, Z. Poulos et al., "Shapeshifter: Enabling fine-grain data width adaptation in deep learning," in 52nd MICRO, 2019.

[48]

C. Lattner and V. Adve, "LLVM: A compilation framework for lifelong program analysis & transformation," in CGO '04, pp. 75--88.

[49]

D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh, "TABLA: a unified template-based framework for accelerating statistical machine learning," in HPCA, 2016.

[50]

V. S. Mailthody, Z. Qureshi, W. Liang, Z. Feng, S. G. D. Gonzalo, Y. Li, H. Franke, J. Xiong, J. Huang, and W. Hwu, "DeepStore: in-storage acceleration for intelligent queries," in 52nd MICRO, 2019.

Digital Library

[51]

L. McMurchie and C. Ebeling, "PathFinder: a negotiation-based performance-driven router for fpgas," in 3rd FPGA, Feb 1995.

Digital Library

[52]

B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, "Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling," IEE Proceedings - Computers and Digital Techniques, vol. 150, no. 5, pp. 255--61--, Sept 2003.

[53]

B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins, "ADRES: an architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix," in FPL, 2003.

[54]

E. Mirsky, A. DeHon et al., "MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources." in FCCM, vol. 96, 1996, pp. 17--19.

[55]

M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: Evaluating spatial computation for whole program execution," in 12th ASPLOS, 2006.

Digital Library

[56]

T. Miyamori and K. Olukotun, "REMARC (abstract): Reconfigurable multimedia array coprocessor," in 6th FPGA, 1998.

Digital Library

[57]

J. M. Mulder, R. J. Portier, A. Srivastava, and R. in't Velt, "An architecture framework for application-specific and scalable architectures," in 16th ISCA, 1989.

[58]

S. Murali and G. De Micheli, "Bandwidth-constrained mapping of cores onto noc architectures," in DATE, vol. 2, Feb 2004.

[59]

S. Murali, G. De Micheli, G. De Micheli, and G. De Micheli, "SUNMAP: a tool for automatic topology selection and generation for nocs," in 41st DAC. ACM, 2004.

Digital Library

[60]

A. Nag, C. N. Ramachandra, R. Balasubramonian, R. Stutsman, E. Giacomin, H. Kambalasubramanyam, and P. Gaillardon, "GenCache: leveraging in-cache operators for efficient sequence alignment," in 52nd MICRO, 2019.

Digital Library

[61]

L. Nardi, D. Koeplinger, and K. Olukotun, "Practical design space exploration," 2018.

[62]

C. Nicol, "A coarse grain reconfigurable array (CGRA) for statically scheduled data flow computing," WaveComputing WhitePaper, 2017.

[63]

K. Niu and J. H. Anderson, "Compact area and performance modelling for cgra architecture evaluation," in FPT, Dec 2018.

[64]

T. Nowatzki, N. Ardalani, K. Sankaralingam, and J. Weng, "Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign," in 27th PACT, 2018.

[65]

T. Nowatzki, V. Gangadhar, N. Ardalani, and K. Sankaralingam, "Stream-dataflow acceleration," in 44th ISCA, 2017.

[66]

T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili, "A general constraint-centric scheduling framework for spatial architectures," in 34th PLDI, 2013.

[67]

S. Önder and R. Gupta, "Automatic generation of microarchitecture simulators," in ICCL, 1998.

[68]

M. K. Papamichael and J. C. Hoe, "Connect: re-examining conventional wisdom for designing nocs in the context of FPGAs," in FPGA, 2012.

[69]

A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel et al., "Triggered Instructions: a control paradigm for spatially-programmed architectures," in 40th ISCA, 2013.

[70]

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, "SCNN: an accelerator for compressed-sparse convolutional neural networks," in 44th ISCA, 2017.

Digital Library

[71]

H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim, "Edge-centric modulo scheduling for coarse-grained reconfigurable architectures," in 17th PACT, 2008.

[72]

S. Pees, A. Hoffmann, V. Zivojnovic, and H. Meyr, "Lisa-machine description language for cycle-accurate models of programmable dsp architectures," in DAC, 1999.

[73]

M. Pellauer, Y. S. Shao, J. Clemons, N. Crago, K. Hegde, R. Venkatesan, S. W. Keckler, C. W. Fletcher, and J. Emer, "Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration," in 24th ASPLOS, 2019.

Digital Library

[74]

L. Pentecost, M. Donato, B. Reagen, U. Gupta, S. Ma, G. Wei, and D. Brooks, "MaxNVM: maximizing DNN storage density and inference efficiency with sparse encoding and error mitigation," in 52nd MICRO, 2019.

Digital Library

[75]

P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik, "Chlorophyll: Synthesis-aided compiler for low-power spatial architectures," in 35th PLDI, 2014.

[76]

A. Pinto, L. P. Carloni, and A. L. Sangiovanni-Vincentelli, "Efficient synthesis of networks on chip," in 21st ICCD, 2003.

[77]

L.-N. Pouchet, "Polybench: The polyhedral benchmark suite," URL: http://www.cs.ucla.edu/pouchet/software/polybench, 2012.

[78]

R. Prabhakar, Y. Zhang, D. Koeplinger, M. Feldman, T. Zhao, S. Hadjis, A. Pedram, C. Kozyrakis, and K. Olukotun, "Plasticine: A reconfigurable architecture for parallel paterns," in 44th ISCA, 2017.

Digital Library

[79]

B. Reagen, R. Adolf, Y. S. Shao, G. Wei, and D. Brooks, "MachSuite: benchmarks for accelerator design and customized architectures," in IISWC, Oct 2014.

[80]

A. Roelke and M. R. Stan, "RISC5: Implementing the RISC-V ISA in gem5," 2017.

[81]

F. Sadi, J. Sweeney, T. M. Low, J. C. Hoe, L. T. Pileggi, and F. Franchetti, "Efficient SpMV operation for large and highly sparse matrices using scalable multi-way merge parallelization," in 52nd MICRO, 2019.

[82]

E. Sadredini, R. Rahimi, V. Verma, M. Stan, and K. Skadron, "eAP: A scalable and efficient in-memory accelerator for automata processing," in 52nd MICRO, 2019.

[83]

A. Sharifian, R. Hojabr, N. Rahimi, S. Liu, A. Guha, T. Nowatzki, and A. Shriraman, "μir -an intermediate representation for transforming and optimizing the microarchitecture of application accelerators," in 52nd MICRO, 2019.

[84]

F. Silfa, G. Dot, J. Arnau, and A. González, "Neuron-level fuzzy memoization in rnns," in 52nd MICRO, 2019.

[85]

H. Singh, M.-H. Lee, G. Lu, N. Bagherzadeh, F. J. Kurdahi, and E. M. C. Filho, "MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications," IEEE Trans. Comput., vol. 49, no. 5, pp. 465--481, May 2000.

Digital Library

[86]

J. R. Stevens, A. Ranjan, D. Das, B. Kaul, and A. Raghunathan, "Manna: An accelerator for memory-augmented neural networks," in 52nd MICRO, 2019.

Digital Library

[87]

D. Suh, K. Kwon, S. Kim, S. Ryu, and J. Kim, "Design space exploration and implementation of a high performance and low area coarse grained reconfigurable processor," in CFP, Dec 2012.

[88]

S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, "WaveScalar," in 36th MICRO, ser. MICRO 36, 2003.

[89]

M. Vachharajani, N. Vachharajani, D. A. Penry, J. A. Blome, and D. I. August, "Microarchitectural exploration with Liberty," in 35th MICRO, 2002.

[90]

Wai Hong Ho and T. M. Pinkston, "A methodology for designing efficient on-chip interconnects on well-behaved communication patterns," in 9th HPCA, Feb 2003.

[91]

M. J. WalkerandJ. H. Anderson, "Generic connectivity-based CGRA mapping via integer linear programming," in 27th FCCM, 2019.

[92]

J. Weng, S. Liu, Z. Wang, V. Dadu, and T. Nowatzki, "A hybrid systolic-dataflow architecture for inductive matrix algorithms," in HPCA, 2019.

[93]

M. Willsey, V. T. Lee, A. Cheung, R. Bodík, and L. Ceze, "Iterative search for reconfigurable accelerator blocks with a compiler in the loop," IEEE TCAD, vol. 38, no. 3, pp. 407--418, 2018.

[94]

L. Wu, A. Lottarini, T. K. Paine, M. A. Kim, and K. A. Ross, "Q100: The architecture and design of a database processing unit," in 19th ASPLOS, 2014.

Digital Library

[95]

T. Xu, B. Tian, and Y. Zhu, "Tigris: Architecture and algorithms for 3d perception in point clouds," in 52nd MICRO, 2019.

Digital Library

[96]

M. Yan, X. Hu, S. Li, A. Basak, H. Li, X. Ma, I. Akgun, Y. Feng, P. Gu, L. Deng et al., "Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach," in 52nd MICRO, 2019.

[97]

Y. Zhang, A. Rucker, M. Vilim, R. Prabhakar, W. Hwang, and K. Olukotun, "Scalable interconnects for reconfigurable spatial architectures," in 46th ISCA, 2019.

[98]

M. Zhu, T. Zhang, Z. Gu, and Y. Xie, "Sparse Tensor Core: algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus," in 52nd MICRO, 2019.

Digital Library

[99]

Y. Zhuo, C. Wang, M. Zhang, R. Wang, D. Niu, Y. Wang, and X. Qian, "GraphQ: scalable pim-based graph processing," in 52nd MICRO, 2019.

Digital Library

Cited By

Chen KMason Nelson TKhadem AFayazi MSingapuram SDreslinski RTalati NKim HBlaauw D(2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3695880
de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656642
Hao YGan YYu BLiu QHan YWan ZLiu STsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)ORIANNA: An Accelerator Generation Framework for Optimization-based Robotic ApplicationsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640379(813-829)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640379
Show More Cited By

DSAGEN: synthesizing programmable spatial accelerators
1. Software and its engineering
  1. Software notations and tools

Recommendations

Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular Papers

In this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Embedded SoPC Design with Nios II Processor and Verilog Examples
Embedded Design Using Programmable Gate Arrays

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture

May 2020

1152 pages

ISBN:9781728146614

General Chairs:
José Martínez
Cornell University
,
José Duato
Universitat Politècnica de València
,
Program Chair:
Lieven Eeckhout
Ghent University

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE

Publisher

IEEE Press

Publication History

Published: 23 September 2020

Check for updates

Qualifiers

Research-article

Conference

ISCA '20

Sponsor:

SIGARCH

ISCA '20: The 47th Annual International Symposium on Computer Architecture

May 30 - June 3, 2020

Virtual Event

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
274
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen KMason Nelson TKhadem AFayazi MSingapuram SDreslinski RTalati NKim HBlaauw D(2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3695880
de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656642
Hao YGan YYu BLiu QHan YWan ZLiu STsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)ORIANNA: An Accelerator Generation Framework for Optimization-based Robotic ApplicationsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640379(813-829)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640379
Wu DChen PBandara TLi ZMitra T(2023)Flip: Data-centric Edge CGRA AcceleratorACM Transactions on Design Automation of Electronic Systems10.1145/363111829:1(1-25)Online publication date: 18-Dec-2023
https://dl.acm.org/doi/10.1145/3631118
Choudhury ZGulati APurini S(2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629523
Dave SNowatzki TShrivastava AAamodt TSwift MJerger N(2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624772
Wang ZLiu CBeckmann NNowatzki T(2023)Affinity Alloc: Taming Not-So Near-Data ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623778(784-799)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623778
Guo LChi YLau JSong LTian XKhatti MQiao WWang JUstun EFang ZZhang ZCong J(2023)TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical DesignACM Transactions on Reconfigurable Technology and Systems10.1145/360933516:4(1-31)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1145/3609335
Melchert JFeng KDonovick CDaly RSharma RBarrett CHorowitz MHanrahan PRaina PAamodt TJerger NSwift M(2023)APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582070(33-45)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582070
Wang DLou JJin NMascarenhas EMahapatra RKinzer SGhodrati SYazdanbakhsh AEsmaeilzadeh HKim NSolihin YHeinrich M(2023)MESA: Microarchitecture Extensions for Spatial Architecture GenerationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589084(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589084
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten