research-article

Serverless Federated AUPRC Optimization for Multi-Party Collaborative Imbalanced Data Mining

Authors:

Xidong Wu,

Zhengmian Hu,

Jian Pei,

Heng HuangAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 2648 - 2659

https://doi.org/10.1145/3580305.3599499

Published: 04 August 2023 Publication History

Get Access

Abstract

To address the big data challenges, serverless multi-party collaborative training has recently attracted attention in the data mining community, since they can cut down the communications cost by avoiding the server node bottleneck. However, traditional serverless multi-party collaborative training algorithms were mainly designed for balanced data mining tasks and are intended to optimize accuracy (e.g., cross-entropy). The data distribution in many real-world applications is skewed and classifiers, which are trained to improve accuracy, perform poorly when applied to imbalanced data tasks since models could be significantly biased toward the primary class. Therefore, the Area Under Precision-Recall Curve (AUPRC) was introduced as an effective metric. Although multiple single-machine methods have been designed to train models for AUPRC maximization, the algorithm for multi-party collaborative training has never been studied. The change from the single-machine to the multi-party setting poses critical challenges. For example, existing single-machine-based AUPRC maximization algorithms maintain an inner state for local each data point, thus these methods are not applicable to large-scale multi-party collaborative training due to the dependence on each local data point.

To address the above challenge, in this paper, we reformulate the serverless multi-party collaborative AUPRC maximization problem as a conditional stochastic optimization problem in a serverless multi-party collaborative learning setting and propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC. After that, we use the variance reduction technique and propose ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate, which matches the best theoretical convergence result reached by the single-machine online method. To the best of our knowledge, this is the first work to solve the multi-party collaborative AUPRC maximization problem. Finally, extensive experiments show the advantages of directly optimizing the AUPRC with distributed learning methods and also verify the efficiency of our new algorithms (i.e., SLATE and SLATE-M).

Supplementary Material

MP4 File (promo.mp4)

AUPRC Optimization, Paper: rtfp1155

Download
29.25 MB

References

[1]

Donald Bamber. 1975. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of mathematical psychology, Vol. 12, 4 (1975), 387--415.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Multi-Party Fair Exchange with an Off-Line Trusted Neutral Party

Optimally Efficient Multi-party Fair Exchange and Fair Secure Multi-party Computation

Secure Multi-Party Computation without Agreement

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations