skip to main content
10.1109/ISCA45697.2020.00032acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

DSAGEN: synthesizing programmable spatial accelerators

Published: 23 September 2020 Publication History

Abstract

Domain-specific hardware accelerators can provide orders of magnitude speedup and energy efficiency over general purpose processors. However, they require extensive manual effort in hardware design and software stack development. Automated ASIC generation (eg. HLS) can be insufficient, because the hardware becomes inflexible. An ideal accelerator generation framework would be automatable, enable deep specialization to the domain, and maintain a uniform programming interface.
Our insight is that many prior accelerator architectures can be approximated by composing a small number of hardware primitives, specifically those from spatial architectures. With careful design, a compiler can understand how to use available primitives, with modular and composable transformations, to take advantage of the features of a given program. This suggests a paradigm where accelerators can be generated by searching within such a rich accelerator design space, guided by the affinity of input programs for hardware primitives and their interactions.
We use this approach to develop the DSAGEN framework, which automates the hardware/software co-design process for reconfigurable accelerators. For several existing accelerators, our evaluation demonstrates that the compiler can achieve 89% of the performance of manually tuned versions. For automated design space exploration, we target multiple sets of workloads which prior accelerators are design for; the generated hardware has mean 1.3 x perf2/mm2 over prior programmable accelerators.

References

[1]
B. Akin, Z. A. Chishti, and A. R. Alameldeen, "ZCOMP: reducing DNN cross-layer memory footprint using vector extensions," in MICRO, 2019.
[2]
M. Annaratone, E. A. Arnould, T. Gross, H. T. Kung, M. S. Lam, O. Menzilcioglu, and J. A. Webb, "The Warp Computer: Architecture, Implementation, and Performance," IEEE Transactions on Computers, 1987.
[3]
G. Ansaloni, P. Bonzini, and L. Pozzi, "EGRA: a coarse grained reconfigurable architectural template," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 6, pp. 1062--1074, 2010.
[4]
M. Auguin, F. Boeri, and E. Carriere, "Automatic exploration of vliw processor architectures from a designer's experience based specification," in Third International Workshop on Hardware/Software Codesign, Sep 1994.
[5]
J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović, "Chisel: Constructing hardware in a Scala embedded language," in 49th DAC, 2012.
[6]
E. Baek, H. Lee, Y. Kim, and J. Kim, "FlexLearn: fast and highly efficient brain simulations using flexible on-chip learning," in 52nd MICRO, 2019.
[7]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti et al., "The gem5 simulator," SIGARCH Comput. Archit. News, 2011.
[8]
F. Bouwens, M. Berekovic, A. Kanstein, and G. Gaydadjiev, "Architectural exploration of the ADRES coarse-grained reconfigurable array," in ARC 2007.
[9]
D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder et al., "Scaling to the end of silicon with EDGE architectures," Computer, 2004.
[10]
A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. H. Anderson, S. Brown, and T. Czajkowski, "Legup: high-level synthesis for fpga-based processor/accelerator systems," in 19th FPGA, 2011.
[11]
T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze et al., "Tvm: An automated end-to-end optimizing compiler for deep learning," in 13th OSDI, 2018.
[12]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning," in 19th ASPLOS. ACM, 2014.
[13]
S. A. Chin, N. Sakamoto, A. Rui, J. Zhao, J. H. Kim, Y. Hara-Azumi, and J. Anderson, "CGRA-ME: a unified framework for cgra modelling and exploration," in 28th ASAP, July 2017.
[14]
S. A. Chin and J. H. Anderson, "An architecture-agnostic integer linear programming approach to cgra mapping," in 55th DAC, 2018.
[15]
S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi, "The Reconfigurable Streaming Vector Processor (RSVP)," in 36th MICRO. IEEE, 2003.
[16]
N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, "Application-specific processing on a general-purpose core via transparent instruction set customization," in MICRO, 2004.
[17]
J. Cong, H. Huang, C. Ma, B. Xiao, and P. Zhou, "A fully pipelined and dynamically composable architecture of CGRA," in 22th FCCM, 2014.
[18]
T. M. Conte and W. Mangione-Smith, "Determining cost-effective multiple issue processor designs," in ICCD, Oct 1993.
[19]
T. M. Conte, K. N. P. Menezes, and S. W. Sathaye, "A technique to determine power-efficient, high-performance superscalar processors," in HICSS, 1995.
[20]
V. Daduand T. Nowatzki, "Towards general purpose acceleration by exploiting common data-dependence forms," in 52nd MICRO, 2019.
[21]
Y. Feng, P. Whatmough, and Y. Zhu, "ASV: accelerated stereo vision system," in 52nd MICRO, ser. MICRO '52. ACM, 2019.
[22]
J. Ferrante, K. J. Ottenstein, and J. D. Warren, "The program dependence graph and its use in optimization," ACM Trans. Program. Lang. Syst., vol. 9, no. 3, pp. 319--349, Jul. 1987.
[23]
J. A. Fisher, P. Faraboschi, and G. Desoli, "Custom-fit processors: Letting applications define architectures," in MICRO, 1996.
[24]
S. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. Taylor, and R. Laufer, "PipeRench: a coprocessor for streaming multimedia acceleration," in 26th ISCA, 1999.
[25]
A. Gondimalla, N. Chesnut, M. Thottethodi, and T. N. Vijaykumar, "SparTen: A sparse tensor accelerator for convolutional neural networks," in 52nd MICRO, 2019.
[26]
V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "DySER: unifying functionality and parallelism specialization for energy-efficient computing," IEEE Micro, Sep. 2012.
[27]
S. Gudaparthi, S. Narayanan, R. Balasubramonian, E. Giacomin, H. Kambalasubramanyam, and P. Gaillardon, "Wire-aware architecture and dataflow for CNN accelerators," in 52nd MICRO, 2019.
[28]
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011.
[29]
A. Halambi, P. Grun, V. Ganesh, A. Khare, N. Dutt, and A. Nicolau, "Expression: a language for architecture exploration through compiler/simulator retargetability," in DATE, March 1999.
[30]
R. Hartenstein, M. Herz, T. Hoffmann, and U. Nageldinger, "Kressarray xplorer: a new cad environment to optimize reconfigurable datapath array architectures," in DAC, Jan 2000.
[31]
K. Hegde, H. A. Moghaddam, M. Pellauer, N. C. Crago, A. Jaleel, E. Solomonik, J. S. Emer, and C. W. Fletcher, "ExTensor: an accelerator for sparse tensor algebra," in 52nd MICRO, 2019.
[32]
B. K. Holmer and A. M. Despain, "Viewing instruction set design as an optimization problem," in 24th MICRO. ACM, 1991.
[33]
W. Hua, Y. Zhou, C. D. Sa, Z. Zhang, and G. E. Suh, "Boosting the performance of CNN accelerators with dynamic fine-grained channel gating," in 52nd MICRO, 2019.
[34]
C. Huang, Y. Ding, H. Wang, C. Weng, K. Lin, L. Wang, and L. Chen, "ecnn: A block-based and highly-parallel CNN accelerator for edge inference," in 52nd MICRO, 2019.
[35]
I. J. Huang and A. M. Despain, "High level synthesis of pipelined instruction set processors and back-end compilers," in DAC, Jun 1992.
[36]
W. Huangfu, X. Li, S. Li, X. Hu, P. Gu, and Y. Xie, "MEDAL: scalable DIMM based near data processing accelerator for DNA seeding algorithm," in 52nd MICRO, 2019.
[37]
R. M. J. Hu, "Energy-aware mapping for tile-based noc architectures under performance constraints," in ASP-DAC, Jan 2003.
[38]
J. Jang, J. Heo, Y. Lee, J. Won, S. Kim, S. Jung, H. Jang, T. J. Ham, and J. W. Lee, "Charon: Specialized near-memory processing architecture for clearing dead objects in memory," in 52nd MICRO, 2019.
[39]
K. Kanellopoulos, N. Vijaykumar, C. Giannoula, R. Azizi, S. Koppula, N. Mansouri-Ghiasi, T. Shahroodi, J. Gómez-Luna, and O. Mutlu, "SMASH: co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations," in 52nd MICRO, 2019.
[40]
Y. Kim, R. N. Mahapatra, and K. Choi, "Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture," IEEE transactions on very large scale integration (VLSI) systems, vol. 18, no. 10, pp. 1471--1482, 2009.
[41]
D. Koeplinger, M. Feldman, R. Prabhakar, Y. Zhang, S. Hadjis, R. Fiszel, T. Zhao, L. Nardi, A. Pedram, C. Kozyrakis et al., "Spatial: A language and compiler for application accelerators," in PLDI, 2018.
[42]
S. Koppula, L. Orosa, A. G. Yaglikçi, R. Azizi, T. Shahroodi, K. Kanellopoulos, and O. Mutlu, "EDEN: enabling energy-efficient, high-performance deep neural network inference using approximate DRAM," in 52nd MICRO, 2019.
[43]
Y. E. Krasteva, F. Criado, E. d. l. Torre, and T. Riesgo, "A fast emulation-based noc prototyping framework," in ReConFig, Dec 2008.
[44]
H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna, "Understanding reuse, performance, and hardware cost of DNN dataflow: A data-centric approach," in 52nd MICRO, 2019.
[45]
H. Kwon, A. Samajdar, and T. Krishna, "MAERI: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects," SIGPLAN Not., vol. 53, no. 2, pp. 461--475, Mar. 2018.
[46]
Y. Kwon, Y. Lee, and M. Rhu, "TensorDIMM: A practical near-memory processing architecture for embeddings and tensor operations in deep learning," in 52nd MICRO, 2019.
[47]
A. D. Lascorz, S. Sharify, I. Edo, D. M. Stuart, O. M. Awad, P. Judd, M. Mahmoud, M. Nikolic, K. Siu, Z. Poulos et al., "Shapeshifter: Enabling fine-grain data width adaptation in deep learning," in 52nd MICRO, 2019.
[48]
C. Lattner and V. Adve, "LLVM: A compilation framework for lifelong program analysis & transformation," in CGO '04, pp. 75--88.
[49]
D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh, "TABLA: a unified template-based framework for accelerating statistical machine learning," in HPCA, 2016.
[50]
V. S. Mailthody, Z. Qureshi, W. Liang, Z. Feng, S. G. D. Gonzalo, Y. Li, H. Franke, J. Xiong, J. Huang, and W. Hwu, "DeepStore: in-storage acceleration for intelligent queries," in 52nd MICRO, 2019.
[51]
L. McMurchie and C. Ebeling, "PathFinder: a negotiation-based performance-driven router for fpgas," in 3rd FPGA, Feb 1995.
[52]
B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, "Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling," IEE Proceedings - Computers and Digital Techniques, vol. 150, no. 5, pp. 255--61--, Sept 2003.
[53]
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins, "ADRES: an architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix," in FPL, 2003.
[54]
E. Mirsky, A. DeHon et al., "MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources." in FCCM, vol. 96, 1996, pp. 17--19.
[55]
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: Evaluating spatial computation for whole program execution," in 12th ASPLOS, 2006.
[56]
T. Miyamori and K. Olukotun, "REMARC (abstract): Reconfigurable multimedia array coprocessor," in 6th FPGA, 1998.
[57]
J. M. Mulder, R. J. Portier, A. Srivastava, and R. in't Velt, "An architecture framework for application-specific and scalable architectures," in 16th ISCA, 1989.
[58]
S. Murali and G. De Micheli, "Bandwidth-constrained mapping of cores onto noc architectures," in DATE, vol. 2, Feb 2004.
[59]
S. Murali, G. De Micheli, G. De Micheli, and G. De Micheli, "SUNMAP: a tool for automatic topology selection and generation for nocs," in 41st DAC. ACM, 2004.
[60]
A. Nag, C. N. Ramachandra, R. Balasubramonian, R. Stutsman, E. Giacomin, H. Kambalasubramanyam, and P. Gaillardon, "GenCache: leveraging in-cache operators for efficient sequence alignment," in 52nd MICRO, 2019.
[61]
L. Nardi, D. Koeplinger, and K. Olukotun, "Practical design space exploration," 2018.
[62]
C. Nicol, "A coarse grain reconfigurable array (CGRA) for statically scheduled data flow computing," WaveComputing WhitePaper, 2017.
[63]
K. Niu and J. H. Anderson, "Compact area and performance modelling for cgra architecture evaluation," in FPT, Dec 2018.
[64]
T. Nowatzki, N. Ardalani, K. Sankaralingam, and J. Weng, "Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign," in 27th PACT, 2018.
[65]
T. Nowatzki, V. Gangadhar, N. Ardalani, and K. Sankaralingam, "Stream-dataflow acceleration," in 44th ISCA, 2017.
[66]
T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili, "A general constraint-centric scheduling framework for spatial architectures," in 34th PLDI, 2013.
[67]
S. Önder and R. Gupta, "Automatic generation of microarchitecture simulators," in ICCL, 1998.
[68]
M. K. Papamichael and J. C. Hoe, "Connect: re-examining conventional wisdom for designing nocs in the context of FPGAs," in FPGA, 2012.
[69]
A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel et al., "Triggered Instructions: a control paradigm for spatially-programmed architectures," in 40th ISCA, 2013.
[70]
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, "SCNN: an accelerator for compressed-sparse convolutional neural networks," in 44th ISCA, 2017.
[71]
H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim, "Edge-centric modulo scheduling for coarse-grained reconfigurable architectures," in 17th PACT, 2008.
[72]
S. Pees, A. Hoffmann, V. Zivojnovic, and H. Meyr, "Lisa-machine description language for cycle-accurate models of programmable dsp architectures," in DAC, 1999.
[73]
M. Pellauer, Y. S. Shao, J. Clemons, N. Crago, K. Hegde, R. Venkatesan, S. W. Keckler, C. W. Fletcher, and J. Emer, "Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration," in 24th ASPLOS, 2019.
[74]
L. Pentecost, M. Donato, B. Reagen, U. Gupta, S. Ma, G. Wei, and D. Brooks, "MaxNVM: maximizing DNN storage density and inference efficiency with sparse encoding and error mitigation," in 52nd MICRO, 2019.
[75]
P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik, "Chlorophyll: Synthesis-aided compiler for low-power spatial architectures," in 35th PLDI, 2014.
[76]
A. Pinto, L. P. Carloni, and A. L. Sangiovanni-Vincentelli, "Efficient synthesis of networks on chip," in 21st ICCD, 2003.
[77]
L.-N. Pouchet, "Polybench: The polyhedral benchmark suite," URL: http://www.cs.ucla.edu/pouchet/software/polybench, 2012.
[78]
R. Prabhakar, Y. Zhang, D. Koeplinger, M. Feldman, T. Zhao, S. Hadjis, A. Pedram, C. Kozyrakis, and K. Olukotun, "Plasticine: A reconfigurable architecture for parallel paterns," in 44th ISCA, 2017.
[79]
B. Reagen, R. Adolf, Y. S. Shao, G. Wei, and D. Brooks, "MachSuite: benchmarks for accelerator design and customized architectures," in IISWC, Oct 2014.
[80]
A. Roelke and M. R. Stan, "RISC5: Implementing the RISC-V ISA in gem5," 2017.
[81]
F. Sadi, J. Sweeney, T. M. Low, J. C. Hoe, L. T. Pileggi, and F. Franchetti, "Efficient SpMV operation for large and highly sparse matrices using scalable multi-way merge parallelization," in 52nd MICRO, 2019.
[82]
E. Sadredini, R. Rahimi, V. Verma, M. Stan, and K. Skadron, "eAP: A scalable and efficient in-memory accelerator for automata processing," in 52nd MICRO, 2019.
[83]
A. Sharifian, R. Hojabr, N. Rahimi, S. Liu, A. Guha, T. Nowatzki, and A. Shriraman, "μir -an intermediate representation for transforming and optimizing the microarchitecture of application accelerators," in 52nd MICRO, 2019.
[84]
F. Silfa, G. Dot, J. Arnau, and A. González, "Neuron-level fuzzy memoization in rnns," in 52nd MICRO, 2019.
[85]
H. Singh, M.-H. Lee, G. Lu, N. Bagherzadeh, F. J. Kurdahi, and E. M. C. Filho, "MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications," IEEE Trans. Comput., vol. 49, no. 5, pp. 465--481, May 2000.
[86]
J. R. Stevens, A. Ranjan, D. Das, B. Kaul, and A. Raghunathan, "Manna: An accelerator for memory-augmented neural networks," in 52nd MICRO, 2019.
[87]
D. Suh, K. Kwon, S. Kim, S. Ryu, and J. Kim, "Design space exploration and implementation of a high performance and low area coarse grained reconfigurable processor," in CFP, Dec 2012.
[88]
S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, "WaveScalar," in 36th MICRO, ser. MICRO 36, 2003.
[89]
M. Vachharajani, N. Vachharajani, D. A. Penry, J. A. Blome, and D. I. August, "Microarchitectural exploration with Liberty," in 35th MICRO, 2002.
[90]
Wai Hong Ho and T. M. Pinkston, "A methodology for designing efficient on-chip interconnects on well-behaved communication patterns," in 9th HPCA, Feb 2003.
[91]
M. J. WalkerandJ. H. Anderson, "Generic connectivity-based CGRA mapping via integer linear programming," in 27th FCCM, 2019.
[92]
J. Weng, S. Liu, Z. Wang, V. Dadu, and T. Nowatzki, "A hybrid systolic-dataflow architecture for inductive matrix algorithms," in HPCA, 2019.
[93]
M. Willsey, V. T. Lee, A. Cheung, R. Bodík, and L. Ceze, "Iterative search for reconfigurable accelerator blocks with a compiler in the loop," IEEE TCAD, vol. 38, no. 3, pp. 407--418, 2018.
[94]
L. Wu, A. Lottarini, T. K. Paine, M. A. Kim, and K. A. Ross, "Q100: The architecture and design of a database processing unit," in 19th ASPLOS, 2014.
[95]
T. Xu, B. Tian, and Y. Zhu, "Tigris: Architecture and algorithms for 3d perception in point clouds," in 52nd MICRO, 2019.
[96]
M. Yan, X. Hu, S. Li, A. Basak, H. Li, X. Ma, I. Akgun, Y. Feng, P. Gu, L. Deng et al., "Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach," in 52nd MICRO, 2019.
[97]
Y. Zhang, A. Rucker, M. Vilim, R. Prabhakar, W. Hwang, and K. Olukotun, "Scalable interconnects for reconfigurable spatial architectures," in 46th ISCA, 2019.
[98]
M. Zhu, T. Zhang, Z. Gu, and Y. Xie, "Sparse Tensor Core: algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus," in 52nd MICRO, 2019.
[99]
Y. Zhuo, C. Wang, M. Zhang, R. Wang, D. Niu, Y. Wang, and X. Qian, "GraphQ: scalable pim-based graph processing," in 52nd MICRO, 2019.

Cited By

View all
  • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
  • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
  • (2024)ORIANNA: An Accelerator Generation Framework for Optimization-based Robotic ApplicationsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640379(813-829)Online publication date: 27-Apr-2024
  • Show More Cited By
  1. DSAGEN: synthesizing programmable spatial accelerators

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture
    May 2020
    1152 pages
    ISBN:9781728146614

    Sponsors

    In-Cooperation

    • IEEE

    Publisher

    IEEE Press

    Publication History

    Published: 23 September 2020

    Check for updates

    Qualifiers

    • Research-article

    Conference

    ISCA '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
    • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
    • (2024)ORIANNA: An Accelerator Generation Framework for Optimization-based Robotic ApplicationsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640379(813-829)Online publication date: 27-Apr-2024
    • (2023)Flip: Data-centric Edge CGRA AcceleratorACM Transactions on Design Automation of Electronic Systems10.1145/363111829:1(1-25)Online publication date: 18-Dec-2023
    • (2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
    • (2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
    • (2023)Affinity Alloc: Taming Not-So Near-Data ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623778(784-799)Online publication date: 28-Oct-2023
    • (2023)TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical DesignACM Transactions on Reconfigurable Technology and Systems10.1145/360933516:4(1-31)Online publication date: 18-Sep-2023
    • (2023)APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582070(33-45)Online publication date: 25-Mar-2023
    • (2023)MESA: Microarchitecture Extensions for Spatial Architecture GenerationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589084(1-14)Online publication date: 17-Jun-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media