skip to main content
10.1145/3445814.3446763acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

RecSSD: near data processing for solid state drive based recommendation inference

Published: 17 April 2021 Publication History

Abstract

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order of magnitude larger capacity, but have worse read latency and bandwidth, degrading inference performance. RecSSD is a near data processing based SSD memory system customized for neural recommendation inference that reduces end-to-end model inference latency by 2× compared to using COTS SSDs across eight industry-representative models.

References

[1]
[n.d.]. Cafe2. https://cafe2.ai/.
[2]
[n.d.]. Cafe2 Operator Catalog. https://cafe2.ai/docs/operatorscatalogue.html#sparselengthssum.
[3]
[n.d.]. Cosmos+ OpenSSD GitHub. https://github.com/CosmosOpenSSD/Cosmos-plus-OpenSSD.
[4]
[n.d.]. Cosmos+ OpenSSD Platform.
[5]
[n.d.]. Cosmos+ OpenSSD Tutorial. https://github.com/CosmosOpenSSD/Cosmos-plus-OpenSSD/blob/master/doc/ Cosmos%2B% 20OpenSSD % 202017 %20Tutorial.pdf.
[6]
[n.d.]. PyTorch. https://pytorch.org/.
[7]
[n.d.]. RecSSD-OpenSSDFirmware GitHub. https://github.com/wilkeningmark/RecSSD-OpenSSDFirmware.
[8]
[n.d.]. RecSSD-RecInfra GitHub. https://github.com/wilkening-mark/RecSSDRecInfra.
[9]
[n.d.]. RecSSD-UNVMeDriver GitHub. https://github.com/wilkeningmark/RecSSD-UNVMeDriver.
[10]
[n.d.]. UNVMe-A User Space NVMe Driver Project. https://github.com/zenglg/unvme.
[11]
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeofs for SSD Performance. In USENIX 2008 Annual Technical Conference (Boston, Massachusetts) ( ATC'08). USENIX Association, USA, 57-70.
[12]
David G Andersen and Steven Swanson. 2010. Rethinking flash in the data center. IEEE micro 4 ( 2010 ), 52-54.
[13]
Michael Chui, J Manyika, M Miremadi, N Henke, R Chung, P Nel, and S Malhotra. 2018. NOTES FROM THE AI FRONTIER INSIGHTS FROM HUNDREDS OF USE CASES.
[14]
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) ( SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 1221-1230. https://doi.org/10.1145/ 2463676.2465295
[15]
Jaeyoung Do, Sudipta Sengupta, and Steven Swanson. 2019. Programmable solid-state storage in future cloud datacenters. Commun. ACM 62, 6 ( 2019 ), 54-62.
[16]
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal) ( EuroSys '18). Association for Computing Machinery, New York, NY, USA, Article 42, 13 pages. https://doi.org/10.1145/3190508.3190524
[17]
Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, and Sachin Katti. 2018. Bandana: Using non-volatile memory for storing deep learning models. arXiv preprint arXiv: 1811. 05922 ( 2018 ).
[18]
Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A Framework for Near-data Processing of Big Data Workloads. In Proceedings of the 43rd International Symposium on Computer Architecture (Seoul, Republic of Korea) (ISCA '16). IEEE Press, Piscataway, NJ, USA, 153-165. https://doi.org/10.1109/ISCA. 2016.23
[19]
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiu Qiao Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David M. Brooks, and Carole-Jean Wu. 2020. DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference. ArXiv abs/ 2001.02772 ( 2020 ).
[20]
Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, et al. 2019. The architectural implications of Facebook's DNN-based personalized recommendation. arXiv preprint arXiv: 1906. 03109 ( 2019 ).
[21]
K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 620-629. https://doi.org/10.1109/HPCA. 2018. 00059
[22]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173-182. https://doi.org/10.1145/3038912.3052569
[23]
Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, and Minsoo Rhu. 2019. NeuMMU: Architectural Support for Eficient Address Translations in Neural Processing Units. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems ( 2019 ).
[24]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. 2019. Basic performance measurements of the intel optane DC persistent memory module. arXiv preprint arXiv: 1903. 05714 ( 2019 ).
[25]
Y. Jin, H. W. Tseng, Y. Papakonstantinou, and S. Swanson. 2017. KAML: A Flexible, High-Performance Key-Value SSD. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 373-384. https://doi.org/10.1109/ HPCA. 2017.15
[26]
Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Y. Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David M. Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Mengxing Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, and Xiu Qiao Wang. 2019. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. ArXiv abs/ 1912.12953 ( 2019 ).
[27]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 740-753.
[28]
Michael Lui, Yavuz Yetim, Özgür Özkan, Zhuoran Zhao, Shin-Yeh Tsai, CaroleJean Wu, and Mark Hempstead. 2020. Understanding Capacity-Driven Scale-Out Neural Recommendation Inference. arXiv: 2011. 02084 [cs.DC]
[29]
Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiting Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, and Mikhail Smelyanskiy. 2020. Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems. ArXiv abs/ 2003.09518 ( 2020 ).
[30]
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs/ 1906.00091 ( 2019 ). http://arxiv.org/abs/ 1906.00091
[31]
Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: software-defined flash for web-scale internet storage systems. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems. 471-484.
[32]
Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. 2018. Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications. arXiv preprint arXiv: 1811. 09886 ( 2018 ).
[33]
Mendel Rosenblum and John K. Ousterhout. 1992. The Design and Implementation of a Log-Structured File System. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992 ), 26-52. https://doi.org/10.1145/146941.146943
[34]
Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In 14th {USENIX} Conference on File and Storage Technologies ({FAST} 16). 67-80.
[35]
Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A Userprogrammable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) ( OSDI'14). USENIX Association, Berkeley, CA, USA, 67-80. http://dl.acm.org/citation.cfm?id= 2685048. 2685055
[36]
Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-eficient, In-situ Data Analytics on Extreme-scale Machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (San Jose, CA) ( FAST'13). USENIX Association, Berkeley, CA, USA, 119-132. http://dl.acm.org/citation.cfm?id= 2591272. 2591286
[37]
Corinna Underwood. 2019. Use Cases of Recommendation Systems in Business-Current Applications and Methods. https://emerj.com/ai-sector-overviews/usecases-recommendation-systems/
[38]
Jianguo Wang, Dongchul Park, Yannis Papakonstantinou, and Steven Swanson. 2016. Ssd in-storage computing for search engines. IEEE Trans. Comput. ( 2016 ).
[39]
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An Eficient Design and Implementation of LSM-tree Based Keyvalue Store on Open-channel SSD. In Proceedings of the Ninth European Conference on Computer Systems (Amsterdam, The Netherlands) (EuroSys '14). ACM, New York, NY, USA, Article 16, 14 pages. https://doi.org/10.1145/2592798.2592804
[40]
Xing Xie, Jianxun Lian, Zheng Liu, Xiting Wang, Fangzhao Wu, Hongwei Wang, and Zhongxia Chen. 2018. Personalized Recommendation Systems: Five Hot Research Topics You Must Know. https://www.microsoft.com/en-us/research/ lab/microsoft-research-asia/articles/personalized-recommendation-systems/
[41]
Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Ming ming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. ArXiv abs/ 2003.05622 ( 2020 ).
[42]
Weijie Zhao, Jingyuan Zhang, Deping Xie, Yulei Qian, Ronglai Jia, and Ping Li. 2019. AIBox: CTR Prediction Model Training on a Single Node. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) ( CIKM '19). Association for Computing Machinery, New York, NY, USA, 319-328. https://doi.org/10.1145/3357384.3358045
[43]
Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending What Video to Watch Next: A Multitask Ranking System. In Proceedings of the 13th ACM Conference on Recommender Systems (Copenhagen, Denmark) ( RecSys '19). ACM, New York, NY, USA, 43-51. https://doi.org/10. 1145/3298689.3346997
[44]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5941-5948.
[45]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1059-1068.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. near data processing
  2. neural networks
  3. solid state drives

Qualifiers

  • Research-article

Conference

ASPLOS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)417
  • Downloads (Last 6 weeks)32
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RecTS: A Temporal-Aware Memory System Optimization for Training Deep Learning Recommendation ModelsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689155(104-117)Online publication date: 16-Sep-2024
  • (2024)ReadGuard: Integrated SSD Management for Priority-Aware Read Performance DifferentiationACM Transactions on Storage10.1145/367688420:4(1-39)Online publication date: 25-Jul-2024
  • (2024)ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization OpportunitiesACM Transactions on Architecture and Code Optimization10.1145/363295121:1(1-24)Online publication date: 19-Jan-2024
  • (2024)Near-Memory Computing With Compressed Embedding Table for Personalized RecommendationIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.334587012:3(938-951)Online publication date: Jul-2024
  • (2024)RecPIM: Efficient In-Memory Processing for Personalized Recommendation Inference Using Near-Bank ArchitectureIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338611743:10(2854-2867)Online publication date: Oct-2024
  • (2024)NDRec: A Near-Data Processing System for Training Large-Scale Recommendation ModelsIEEE Transactions on Computers10.1109/TC.2024.336593973:5(1248-1261)Online publication date: May-2024
  • (2024)Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00071(915-930)Online publication date: 29-Jun-2024
  • (2024)MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00064(818-833)Online publication date: 29-Jun-2024
  • (2024)ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00038(410-423)Online publication date: 29-Jun-2024
  • (2024)NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00035(368-381)Online publication date: 29-Jun-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media