Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules
- Zaifeng Pan,
- Zhen Zheng,
- Feng Zhang,
- Bing Xie,
- Ruofan Wu,
- Shaden Smith,
- Chuanjie Liu,
- Olatunji Ruwase,
- Xiaoyong Du,
- Yufei Ding
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 41, Pages 1–15https://doi.org/10.1109/SC41406.2024.00047Industrial recommendation models typically involve numerous feature fields. The embedding computation workloads are heterogeneous across these fields, thus requiring varied optimal code schedules. While existing solutions apply basic fusion optimization ...
- research-articleJuly 2024
OPER: optimality-guided embedding table parallelization for large-scale recommendation model
USENIX ATC'24: Proceedings of the 2024 USENIX Conference on Usenix Annual Technical ConferenceArticle No.: 41, Pages 667–682The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained ...
- research-articleApril 2024
OnePerc: A Randomness-aware Compiler for Photonic Quantum Computing
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 738–754https://doi.org/10.1145/3620666.3651372The photonic platform holds great promise for quantum computing. Nevertheless, the intrinsic probabilistic characteristic of its native fusion operations introduces substantial randomness into the computing process, posing significant challenges to ...
- research-articleApril 2024
EVT: Accelerating Deep Learning Training with Epilogue Visitor Tree
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 301–316https://doi.org/10.1145/3620666.3651369As deep learning models become increasingly complex, the deep learning compilers are critical for enhancing the system efficiency and unlocking hidden optimization opportunities. Although excellent speedups have been achieved in inference workloads, ...
- research-articleApril 2024
RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 964–979https://doi.org/10.1145/3620665.3640406Ensuring high-quality recommendations for newly onboarded users requires the continuous retraining of Deep Learning Recommendation Models (DLRMs) with freshly generated data. To serve the online DLRM retraining, existing solutions use hundreds of CPU ...
-
- research-articleApril 2024
MECH: Multi-Entry Communication Highway for Superconducting Quantum Chiplets
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 699–714https://doi.org/10.1145/3620665.3640377Chiplet architecture is an emerging architecture for quantum computing that could significantly increase qubit resources with its great scalability and modularity. However, as the computing scale increases, communication between qubits would become a ...
ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1Pages 450–464https://doi.org/10.1145/3617232.3624852Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the ...
- research-articleNovember 2023
QASMTrans: A QASM Quantum Transpiler Framework for NISQ Devices
- Fei Hua,
- Meng Wang,
- Gushu Li,
- Bo Peng,
- Chenxu Liu,
- Muqing Zheng,
- Samuel Stein,
- Yufei Ding,
- Eddy Z. Zhang,
- Travis Humble,
- Ang Li
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1468–1477https://doi.org/10.1145/3624062.3624222The success of a quantum algorithm hinges on the ability to orchestrate a successful application induction. Detrimental overheads in mapping general quantum circuits to physically implementable routines can be the deciding factor between a successful ...
- research-articleDecember 2023
RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 338–352https://doi.org/10.1145/3613424.3623775This paper proposes RM-STC, a novel GPU tensor core architecture designed for sparse Deep Neural Networks (DNNs) with two key innovations: (1) native support for both training and inference and (2) high efficiency for all sparsity degrees. To achieve the ...
- research-articleDecember 2023
QuComm: Optimizing Collective Communication for Distributed Quantum Computing
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 479–493https://doi.org/10.1145/3613424.3614253Distributed quantum computing (DQC) is a scalable way to build a large-scale quantum computing system. Previous compilers for DQC focus on either qubit-to-qubit inter-node gates or qubit-to-node nonlocal circuit blocks, missing opportunities of ...
- research-articleJuly 2023
MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 3Article No.: 40, Pages 1–26https://doi.org/10.1145/3603113With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed 3D-stacking near-bank ...
- research-articleJune 2023
ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 58, Pages 1–14https://doi.org/10.1145/3579371.3589093With the rapid growth of classification scale in deep learning systems, the final classification layer becomes extreme classification with a memory footprint exceeding the main memory capacity of the CPU or GPU. The emerging in-storage-computing ...
- research-articleJune 2023
OneQ: A Compilation Framework for Photonic One-Way Quantum Computation
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 12, Pages 1–14https://doi.org/10.1145/3579371.3589047In this paper, we propose OneQ, the first optimizing compilation framework for one-way quantum computation towards realistic photonic quantum architectures. Unlike previous compilation efforts for solid-state qubit technologies, our innovative ...
- research-articleJune 2023
Q-BEEP: Quantum Bayesian Error Mitigation Employing Poisson Modeling over the Hamming Spectrum
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 8, Pages 1–13https://doi.org/10.1145/3579371.3589043Quantum computing technology has grown rapidly in recent years, with new technologies being explored, error rates being reduced, and quantum processors' qubit capacity growing. However, near-term quantum algorithms are still unable to be induced ...
- research-articleJune 2023
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 19, Issue 5sArticle No.: 172, Pages 1–17https://doi.org/10.1145/3587936Deep neural networks (DNNs) are widely used for computer vision tasks. However, it has been shown that deep models are vulnerable to adversarial attacks—that is, their performances drop when imperceptible perturbations are made to the original inputs, ...
SPG: Structure-Private Graph Database via SqueezePIR
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 7Pages 1615–1628https://doi.org/10.14778/3587136.3587138Many relational data in our daily life are represented as graphs, making graph application an important workload. Because of the large scale of graph datasets, moving graph data to the cloud becomes a popular option. To keep the confidential and private ...
Dynamic N:M Fine-Grained Structured Sparse Attention Mechanism
PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 369–379https://doi.org/10.1145/3572848.3577500Transformers are becoming the mainstream solutions for various tasks like NLP and Computer vision. Despite their success, the high complexity of the attention mechanism hinders them from being applied to latency-sensitive tasks. One opportunity to ...
- research-articleNovember 2022
Biologically inspired dynamic thresholds for spiking neural networks
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 441, Pages 6090–6103The dynamic membrane potential threshold, as one of the essential properties of a biological neuron, is a spontaneous regulation mechanism that maintains neuronal homeostasis, i.e., the constant overall spiking firing rate of a neuron. As such, the ...
- research-articleNovember 2022
EL-Rec: efficient large-scale recommendation model training via tensor-train embedding table
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 70, Pages 1–14Deep learning Recommendation Models (DLRMs) plays an important role in various application domains. However, existing DLRM training systems require a large number of GPUs due to the memory-intensive embedding tables. To this end, we propose EL-Rec, an ...
- research-articleNovember 2022
LightSeq2: accelerated training for transformer-based models on GPUs
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 38, Pages 1–14Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and Transformer's ...