Author: Ding, Yufei : Search

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

59 Results for: Author: Ding, YufeiEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,842,466 records)|Limit your search to The ACM Full-Text Collection (774,529 records)

Showing 1 - 20of59 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
November 2024
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Results Replicated / v1.1
RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 41, Pages 1–15https://doi.org/10.1109/SC41406.2024.00047

Industrial recommendation models typically involve numerous feature fields. The embedding computation workloads are heterogeneous across these fields, thus requiring varied optimal code schedules. While existing solutions apply basic fusion optimization ...
0
107
Metrics
Total Citations0
Total Downloads107
Last 12 Months107
Last 6 weeks31
1
Supplementary Material
recflex-_enabling_feature_heterogeneity-aware_optimization_for_deep_recommendation_models_with_flexible_schedules (1080p).mp4
Get Access
research-article
July 2024
OPER: optimality-guided embedding table parallelization for large-scale recommendation model
USENIX ATC'24: Proceedings of the 2024 USENIX Conference on Usenix Annual Technical ConferenceArticle No.: 41, Pages 667–682

The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained ...
0
Metrics
Total Citations0
research-article
Open Access
April 2024
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Results Reproduced / v1.1
OnePerc: A Randomness-aware Compiler for Photonic Quantum Computing
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 738–754https://doi.org/10.1145/3620666.3651372

The photonic platform holds great promise for quantum computing. Nevertheless, the intrinsic probabilistic characteristic of its native fusion operations introduces substantial randomness into the computing process, posing significant challenges to ...
0
521
Metrics
Total Citations0
Total Downloads521
Last 12 Months521
Last 6 weeks44
View online with eReader
PDF
research-article
Open Access
April 2024
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Results Reproduced / v1.1
EVT: Accelerating Deep Learning Training with Epilogue Visitor Tree
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 301–316https://doi.org/10.1145/3620666.3651369

As deep learning models become increasingly complex, the deep learning compilers are critical for enhancing the system efficiency and unlocking hidden optimization opportunities. Although excellent speedups have been achieved in inference workloads, ...
1
2,521
Metrics
Total Citations1
Total Downloads2,521
Last 12 Months2,521
Last 6 weeks216
View online with eReader
PDF
research-article
Open Access
April 2024
Results Reproduced / v1.1
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing
- Zheng Wang,
- Yuke Wang,
- Jiaqi Deng,
- Da Zheng,
- Ang Li,
- Yufei Ding
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 964–979https://doi.org/10.1145/3620665.3640406

Ensuring high-quality recommendations for newly onboarded users requires the continuous retraining of Deep Learning Recommendation Models (DLRMs) with freshly generated data. To serve the online DLRM retraining, existing solutions use hundreds of CPU ...
2
1,209
Metrics
Total Citations2
Total Downloads1,209
Last 12 Months1,209
Last 6 weeks153
View online with eReader
PDF
Upcoming Conferences
Skip slideshow

ASPLOS '25

March 30 - April 3, 2025

Postillion Hotel and Convention Centre WTC Rotterdam, Rotterdam, Netherlands

ISCA '25

June 21 - 25, 2025

Waseda University & RIHGA Royal Hotel Tokyo, Tokyo, Japan

ISCA '25 Website

DAC '25

June 22 - 26, 2025

Moscone Center, San Francisco, CA, USA

DAC '25 Website

SPLASH '25

October 20 - 27, 2025

Marina Bay Sands, Singapore, Singapore

SPLASH '25 Website

CIKM '25

November 10 - 14, 2025

COEX, Seoul, Republic of Korea

SC '25

November 16 - 21, 2025

America's Center, St Louis, MO, USA

SC '25 Website
research-article
Open Access
April 2024
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Results Reproduced / v1.1
MECH: Multi-Entry Communication Highway for Superconducting Quantum Chiplets
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 699–714https://doi.org/10.1145/3620665.3640377

Chiplet architecture is an emerging architecture for quantum computing that could significantly increase qubit resources with its great scalability and modularity. However, as the computing scale increases, communication between qubits would become a ...
2
911
Metrics
Total Citations2
Total Downloads911
Last 12 Months911
Last 6 weeks58
View online with eReader
PDF
research-article
Open Access
April 2024
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1Pages 450–464https://doi.org/10.1145/3617232.3624852

Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the ...
0
909
Metrics
Total Citations0
Total Downloads909
Last 12 Months909
Last 6 weeks127
View online with eReader
PDF
research-article
November 2023
QASMTrans: A QASM Quantum Transpiler Framework for NISQ Devices
- Fei Hua,
- Meng Wang,
- Gushu Li,
- Bo Peng,
- Chenxu Liu,
- Muqing Zheng,
- Samuel Stein,
- Yufei Ding,
- Eddy Z. Zhang,
- Travis Humble,
- Ang Li
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1468–1477https://doi.org/10.1145/3624062.3624222

The success of a quantum algorithm hinges on the ability to orchestrate a successful application induction. Detrimental overheads in mapping general quantum circuits to physically implementable routines can be the deciding factor between a successful ...
4
182
Metrics
Total Citations4
Total Downloads182
Last 12 Months143
Last 6 weeks5
1
Supplementary Material
qasmtrans-_a_qasm_quantum_transpiler_framework_for_nisq_devices (1080p).mp4
Get Access
research-article
Open Access
December 2023
RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 338–352https://doi.org/10.1145/3613424.3623775

This paper proposes RM-STC, a novel GPU tensor core architecture designed for sparse Deep Neural Networks (DNNs) with two key innovations: (1) native support for both training and inference and (2) high efficiency for all sparsity degrees. To achieve the ...
3
1,539
Metrics
Total Citations3
Total Downloads1,539
Last 12 Months1,388
Last 6 weeks106
View online with eReader
View this article in HTML format
PDF
research-article
Open Access
December 2023
QuComm: Optimizing Collective Communication for Distributed Quantum Computing
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 479–493https://doi.org/10.1145/3613424.3614253

Distributed quantum computing (DQC) is a scalable way to build a large-scale quantum computing system. Previous compilers for DQC focus on either qubit-to-qubit inter-node gates or qubit-to-node nonlocal circuit blocks, missing opportunities of ...
2
1,210
Metrics
Total Citations2
Total Downloads1,210
Last 12 Months1,121
Last 6 weeks83
View online with eReader
View this article in HTML format
PDF
research-article
Open Access
July 2023
MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 3Article No.: 40, Pages 1–26https://doi.org/10.1145/3603113
With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed 3D-stacking near-bank ...
2
3,288
Metrics
Total Citations2
Total Downloads3,288
Last 12 Months1,431
Last 6 weeks120
View online with eReader
PDF
research-article
Open Access
June 2023
ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification
- Siqi Li,
- Fengbin Tu,
- Liu Liu,
- Jilan Lin,
- Zheng Wang,
- Yangwook Kang,
- Yufei Ding,
- Yuan Xie
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 58, Pages 1–14https://doi.org/10.1145/3579371.3589093

With the rapid growth of classification scale in deep learning systems, the final classification layer becomes extreme classification with a memory footprint exceeding the main memory capacity of the CPU or GPU. The emerging in-storage-computing ...
9
2,279
Metrics
Total Citations9
Total Downloads2,279
Last 12 Months1,199
Last 6 weeks95
View online with eReader
PDF
research-article
Open Access
June 2023
OneQ: A Compilation Framework for Photonic One-Way Quantum Computation
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 12, Pages 1–14https://doi.org/10.1145/3579371.3589047

In this paper, we propose OneQ, the first optimizing compilation framework for one-way quantum computation towards realistic photonic quantum architectures. Unlike previous compilation efforts for solid-state qubit technologies, our innovative ...
6
1,320
Metrics
Total Citations6
Total Downloads1,320
Last 12 Months678
Last 6 weeks58
View online with eReader
PDF
research-article
Public Access
June 2023
Q-BEEP: Quantum Bayesian Error Mitigation Employing Poisson Modeling over the Hamming Spectrum
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 8, Pages 1–13https://doi.org/10.1145/3579371.3589043

Quantum computing technology has grown rapidly in recent years, with new technologies being explored, error rates being reduced, and quantum processors' qubit capacity growing. However, near-term quantum algorithms are still unable to be induced ...
3
573
Metrics
Total Citations3
Total Downloads573
Last 12 Months309
Last 6 weeks34
View online with eReader
PDF
research-article
June 2023
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks
- Yang Wang,
- Bo Dong,
- Ke Xu,
- Haiyin Piao,
- Yufei Ding,
- Baocai Yin,
- Xin Yang
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 19, Issue 5sArticle No.: 172, Pages 1–17https://doi.org/10.1145/3587936
Deep neural networks (DNNs) are widely used for computer vision tasks. However, it has been shown that deep models are vulnerable to adversarial attacks—that is, their performances drop when imperceptible perturbations are made to the original inputs, ...
13
526
Metrics
Total Citations13
Total Downloads526
Last 12 Months124
Last 6 weeks12
Get Access
research-article
March 2023
Artifacts Available / v1.1
SPG: Structure-Private Graph Database via SqueezePIR
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 7Pages 1615–1628https://doi.org/10.14778/3587136.3587138

Many relational data in our daily life are represented as graphs, making graph application an important workload. Because of the large scale of graph datasets, moving graph data to the cloud becomes a popular option. To keep the confidential and private ...
2
213
Metrics
Total Citations2
Total Downloads213
Last 12 Months91
Last 6 weeks0
Get Access
research-article
Open Access
February 2023
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Dynamic N:M Fine-Grained Structured Sparse Attention Mechanism
PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 369–379https://doi.org/10.1145/3572848.3577500

Transformers are becoming the mainstream solutions for various tasks like NLP and Computer vision. Despite their success, the high complexity of the attention mechanism hinders them from being applied to latency-sensitive tasks. One opportunity to ...
5
2,113
Metrics
Total Citations5
Total Downloads2,113
Last 12 Months1,199
Last 6 weeks109
1
Supplementary Material
p369-chen-supp.pdf
View online with eReader
PDF
research-article
November 2022
Biologically inspired dynamic thresholds for spiking neural networks
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 441, Pages 6090–6103

The dynamic membrane potential threshold, as one of the essential properties of a biological neuron, is a spontaneous regulation mechanism that maintains neuronal homeostasis, i.e., the constant overall spiking firing rate of a neuron. As such, the ...
0
Metrics
Total Citations0
1
Supplementary Material
Additional material
research-article
November 2022
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Results Reproduced / v1.1
EL-Rec: efficient large-scale recommendation model training via tensor-train embedding table
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 70, Pages 1–14

Deep learning Recommendation Models (DLRMs) plays an important role in various application domains. However, existing DLRM training systems require a large number of GPUs due to the memory-intensive embedding tables. To this end, we propose EL-Rec, an ...
0
318
Metrics
Total Citations0
Total Downloads318
Last 12 Months131
Last 6 weeks2
1
Supplementary Material
SC22_Presentation_Wang_Zheng.mp4
Get Access
research-article
November 2022
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Results Reproduced / v1.1
LightSeq2: accelerated training for transformer-based models on GPUs
- Xiaohui Wang,
- Yang Wei,
- Ying Xiong,
- Guyue Huang,
- Xian Qian,
- Yufei Ding,
- Mingxuan Wang,
- Lei Li
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 38, Pages 1–14

Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and Transformer's ...
0
235
Metrics
Total Citations0
Total Downloads235
Last 12 Months70
Last 6 weeks4
1
Supplementary Material
SC22_Presentation_Wang_Xai.mp4
Get Access

Applied Filters

People

Names

Institutions

Authors

Advisors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences