skip to main content
research-article
Free access
Just Accepted

Efficient Distributed Sparse Relative Similarity Learning

Online AM: 16 January 2025 Publication History

Abstract

Learning a good similarity measure for large-scale high-dimensional data is a crucial task in machine learning applications, yet it poses a significant challenge. Distributed minibatch stochastic gradient descent (SGD) serves as an efficient optimization method in large-scale distributed training, allowing linear speedup in proportion to the number of workers. However, communication efficiency in distributed SGD requires a sufficiently large minibatch size, presenting two distinct challenges. Firstly, a large minibatch size leads to high memory usage and computational complexity during parallel training of high-dimensional models. Second, a larger batch size of data reduces the convergence rate. To overcome these challenges, we propose an efficient distributed sparse relative similarity learning framework EDSRSL. This framework integrates two strategies: local minibatch SGD and sparse relative similarity learning. By effectively reducing the number of updates through synchronous delay while maintaining a large batch size, we address the issue of high computational cost. Additionally, we incorporate sparse model learning into the training process, significantly reducing computational cost. This paper also provides theoretical proof that the convergence rate does not decrease significantly with increasing batch size. Various experiments on six high-dimensional real-world datasets demonstrate the efficacy and efficiency of the proposed algorithms, with a communication cost reduction of up to \(90.89\%\) and a maximum wall time speedup of \(5.66\times\) compared to the baseline methods.

References

[1]
Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large Scale Online Learning of Image Similarity Through Ranking. Journal of Machine Learning Research 11 (2010), 1109–1135.
[2]
Chaochao Chen, Jun Zhou, Li Wang, Xibin Wu, Wenjing Fang, Jin Tan, Lei Wang, Alex X Liu, Hao Wang, and Cheng Hong. 2021. When Homomorphic Encryption Marries Secret Sharing: Secure Large-Scale Sparse Logistic Regression and Applications in Risk Control. In Proc. of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 2652–2662.
[3]
Ling Chen, Dandan Lyu, Shanshan Yu, and Gencai Chen. 2023. Multi-Level Visual Similarity Based Personalized Tourist Attraction Recommendation Using Geo-Tagged Photos. ACM Transactions on Knowledge Discovery from Data 17, 7 (2023), 92:1–92:18.
[4]
Daning Cheng, Shigang Li, and Yunquan Zhang. 2020. WP-SGD: Weighted Parallel SGD for Distributed Unbalanced-workload Training System. Journal of Parallel and Distributed Computing 145 (2020), 202–216.
[5]
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 7 (2006), 551–585.
[6]
Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. 2007. Information-theoretic Metric Learning. In Proc. of the 24th International Conference on Machine Learning (ICML). ACM, 209–216.
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-scale Hierarchical Image Database. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248–255.
[8]
Yi Ding, Peilin Zhao, Steven C. H. Hoi, and Yew-Soon Ong. 2015. An Adaptive Gradient Method for Online AUC Maximization. In Proc. of the 29th AAAI Conference on Artificial Intelligence (AAAI). 2568–2574.
[9]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12, 7 (2011).
[10]
Anis Elgabli, Jihong Park, Amrit S. Bedi, Mehdi Bennis, and Vaneet Aggarwal. 2020. GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning. Journal of Machine Learning Research 21 (2020), 76:1–76:39.
[11]
Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 Object Category Dataset. Technical Report. California Institute of Technology.
[12]
Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge J. Belongie, and Deborah Estrin. 2017. Collaborative Metric Learning. In Proc. of the 26th International Conference on World Wide Web (WWW). 193–201.
[13]
Keke Huang, Shijun Tao, Yishun Liu, Chunhua Yang, and Weihua Gui. 2022. Label Propagation Dictionary Learning based Process Monitoring Method for Industrial Process with Between-mode Similarity. Science China Information Sciences 65, 1 (2022), 110203.
[14]
Gauri Joshi and Shiqiang Wang. 2022. Communication-Efficient Distributed Optimization Algorithms. In Federated Learning - A Comprehensive Overview of Methods and Applications. Springer, 125–143.
[15]
Ahmed Khaled, Konstantin Mishchenko, and Peter Richtárik. 2020. Tighter Theory for Local SGD on Identical and Heterogeneous Data. In Proc. of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS). 4519–4529.
[16]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2169–2178.
[17]
Guy Lebanon. 2006. Metric Learning for Text Documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 4 (2006), 497–508.
[18]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based Learning Applied to Document Recognition. Proc. of the IEEE 86, 11 (1998), 2278–2324.
[19]
Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A. Gibson, and Eric P. Xing. 2014. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning. In Proc. of the 28th International Conference on Neural Information Processing Systems (NIPS). 2834–2842.
[20]
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5 (2004), 361–397.
[21]
Mingming Li, Shuai Zhang, Fuqing Zhu, Wanhui Qian, Liangjun Zang, Jizhong Han, and Songlin Hu. 2020. Symmetric Metric Learning with Adaptive Margin for Recommendation. In Proc. of the 34th Conference on Artificial Intelligence (AAAI). 4634–4641.
[22]
Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J. Smola. 2014. Efficient Mini-batch Training for Stochastic Optimization. In Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 661–670.
[23]
Tao Lin, Sebastian U Stich, Kumar Kshitij Patel, and Martin Jaggi. 2020. Don’t Use Large Mini-batches, Use Local SGD. In Proc. of the 8th International Conference on Learning Representations (ICLR).
[24]
Kuan Liu, Aurélien Bellet, and Fei Sha. 2015. Similarity Learning for High-Dimensional Sparse Data. In Proc. of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS).
[25]
David Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60 (2004), 91–110.
[26]
Andrew McCallum and Kamal Nigam. 1998. A Comparison of Event Models for Naive Bayes Text Classification. In Proc. of AAAI-98 Workshop on Learning for Text Categorization. 41–48.
[27]
Divya Pandove, Shivani Goel, and Rinkle Rani. 2018. Systematic Review of Clustering High-Dimensional and Large Datasets. ACM Transactions on Knowledge Discovery from Data 12, 2 (2018), 16:1–16:68.
[28]
Christopher J Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. 2019. Measuring the Effects of Data Parallelism on Neural Network Training. Journal of Machine Learning Research 20, 112 (2019), 1–49.
[29]
Kun Song, Junwei Han, Gong Cheng, Jiwen Lu, and Feiping Nie. 2021. Adaptive Neighborhood Metric Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 4591–4604.
[30]
Sebastian U Stich. 2019. Local SGD Converges Fast and Communicates Little. In Proc. of the 7th International Conference on Learning Representations (ICLR).
[31]
Sebastian U Stich, Jean-Baptiste Cordonnier, and Martin Jaggi. 2018. Sparsified SGD with Memory. In Proc. of the 32nd International Conference on Neural Information Processing Systems (NIPS). 4447–4458.
[32]
Jianhui Sun, Ying Yang, Guangxu Xun, and Aidong Zhang. 2023. Scheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGD. ACM Transactions on Knowledge Discovery from Data 17, 2 (2023), 29:1–29:37.
[33]
Xun Tang, Maha Alabduljalil, Xin Jin, and Tao Yang. 2017. Partitioned Similarity Search with Cache-Conscious Data Traversal. ACM Transactions on Knowledge Discovery from Data 11, 3 (2017), 34:1–34:38.
[34]
Bohan Wang, Huishuai Zhang, Zhiming Ma, and Wei Chen. 2023. Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions. In Proc. of the 36th Annual Conference on Learning Theory (COLT). 161–190.
[35]
Hongyi Wang, Saurabh Agarwal, and Dimitris Papailiopoulos. 2021. Pufferfish: Communication-efficient Models At No Extra Cost. In Proc. of the 4th Conference on Machine Learning and Systems (MLSys). 365–386.
[36]
Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. 2019. Learning Robust Global Representations by Penalizing Local Predictive Power. In Proc. of the 33rd International Conference on Neural Information Processing Systems (NeurIPS). 10506–10518.
[37]
Jianyu Wang and Gauri Joshi. 2019. Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD. In Proc. of the 2nd Conference on Machine Learning and Systems (MLSys).
[38]
J Wang, J Yang, K Yu, F Lv, T Huang, and Y Gong. 2010. Locality-constrained Linear Coding for Image Classification. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3360–3367.
[39]
Rachel Ward, Xiaoxia Wu, and Leon Bottou. 2020. Adagrad Stepsizes: Sharp Convergence over Nonconvex Landscapes. Journal of Machine Learning Research 21, 219 (2020), 1–30.
[40]
Ying Wei, Yangqiu Song, Yi Zhen, Bo Liu, and Qiang Yang. 2016. Heterogeneous Translated Hashing: A Scalable Solution Towards Multi-Modal Similarity Search. ACM Transactions on Knowledge Discovery from Data 10, 4 (2016), 36:1–36:28.
[41]
Kilian Q. Weinberger, John Blitzer, and Lawrence K. Saul. 2005. Distance Metric Learning for Large Margin Nearest Neighbor Classification. In Proc. of the 19th International Conference on Neural Information Processing Systems (NIPS). 1473–1480.
[42]
Pengcheng Wu, Yi Ding, Peilin Zhao, Chunyan Miao, and Steven C. H. Hoi. 2014. Learning Relative Similarity by Stochastic Dual Coordinate Ascent. In Proc. of the 28th AAAI Conference on Artificial Intelligence (AAAI). 2142–2148.
[43]
Yue Wu, Steven C. H. Hoi, Tao Mei, and Nenghai Yu. 2017. Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 48:1–48:22.
[44]
Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart J. Russell. 2002. Distance Metric Learning with Application to Clustering with Side-Information. In Proc. of the 16th International Conference on Neural Information Processing Systems (NIPS). 505–512.
[45]
Peng Yang and Ping Li. 2020. Distributed Primal-Dual Optimization for Online Multi-Task Learning. In Proc. of the AAAI Conference on Artificial Intelligence (AAAI). 6631–6638.
[46]
Yang Yang, Shengcai Liao, Zhen Lei, and Stan Z. Li. 2016. Large Scale Similarity Learning Using Similar Pairs for Person Verification. In Proc. of the 30th AAAI Conference on Artificial Intelligence (AAAI). 3655–3661.
[47]
Dezhong Yao, Peilin Zhao, Chen Yu, Hai Jin, and Bin Li. 2015. Sparse Online Relative Similarity Learning. In Proc. of the IEEE International Conference on Data Mining (ICDM). 529–538.
[48]
Dong Yin, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, and Peter Bartlett. 2018. Gradient Diversity: a Key Ingredient for Scalable Distributed Learning. In Proc. of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS).
[49]
Hao Yu, Sen Yang, and Shenghuo Zhu. 2019. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. In Proc. of the 33rd AAAI Conference on Artificial Intelligence (AAAI). 5693–5700.
[50]
Davood Zabihzadeh, Amar Tuama, Ali Karami-Mollaee, and Seyed Jalaleddin Mousavirad. 2023. Low-Rank Robust Online Distance/Similarity Learning based on the Rescaled Hinge Loss. Applied Intelligence 53, 1 (2023), 634–657.
[51]
Qin Zhang, Peng Zhang, Guodong Long, Wei Ding, Chengqi Zhang, and Xindong Wu. 2016. Online Learning from Trapezoidal Data Streams. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2709–2723.
[52]
Xiaolin Zheng, Jiajie Su, Weiming Liu, and Chaochao Chen. 2022. DDGHM: Dual Dynamic Graph with Hybrid Metric Training for Cross-Domain Sequential Recommendation. In Proc. of the 30th ACM International Conference on Multimedia. 471–481.
[53]
Yu Zhou, Jianbin Huang, Heli Sun, Yizhou Sun, Shaojie Qiao, and Stephen Manko Wambura. 2019. Recurrent Meta-Structure for Robust Similarity Measure in Heterogeneous Information Networks. ACM Transactions on Knowledge Discovery from Data 13, 6 (2019), 64:1–64:33.
[54]
Martin Zinkevich, Markus Weimer, Alexander J. Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. In Proc. of the 24th International Conference on Neural Information Processing Systems (NeurIPS). 2595–2603.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data Just Accepted
EISSN:1556-472X
Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 16 January 2025
Accepted: 06 January 2025
Revised: 30 December 2024
Received: 29 December 2023

Check for updates

Author Tags

  1. Distributed SGD
  2. synchronization
  3. relative similarity learning
  4. high-dimensionality
  5. sparse learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 57
    Total Downloads
  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)57
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media