skip to main content
10.1145/3034786.3034795acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Private Incremental Regression

Published: 09 May 2017 Publication History

Abstract

Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. A variety of offline machine learning tasks are known to be feasible under differential privacy, where generic construction exist that, given a large enough input sample, perform tasks such as PAC learning, Empirical Risk Minimization (ERM), regression, etc. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of ≈√d where d the input and constraint set can be used to derive significantly better results for certain interesting regression problems. Our second mechanism which achieves this is based on the idea of projecting the data to a lower dimensional space using random projections, and then adding privacy noise in this low dimensional space. The mechanism overcomes the issues of adaptivity inherent with the use of random projections in online streams, and uses recent developments in high-dimensional estimation to achieve an excess empirical risk bound of ≈ T1/3 W2/3, where T is the length of the stream and W is the sum of the Gaussian widths of the input domain and the constraint set that we optimize over.

References

[1]
F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with Sparsity-inducing Penalties. Foundations and Trends in Machine Learning, 2012.
[2]
R. Bassily, A. Smith, and A. Thakurta. Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds. In FOCS. IEEE, 2014.
[3]
J. Blocki, A. Blum, A. Datta, and O. Sheffet. The Johnson-Lindenstrauss Transform itself Preserves Differential Privacy. In FOCS. IEEE, 2012.
[4]
A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical Privacy: The SuLQ Framework. In PODS. ACM, 2005.
[5]
J. Bourgain, D. Sjoerd, and J. Nelson. Toward a Unified Theory of Sparse Dimensionality Reduction in Euclidean Space. In STOC. ACM, 2015.
[6]
T.-H. H. Chan, E. Shi, and D. Song. Private and Continual Release of Statistics. TISSEC, 14(3), 2011.
[7]
K. Chaudhuri and C. Monteleoni. Privacy-preserving Logistic Regression. In NIPS, 2009.
[8]
K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially Private Empirical Risk Minimization. JMLR, 12, 2011.
[9]
K. L. Clarkson. Coresets, Sparse Greedy Approximation, and the Frank-Wolfe Algorithm. TALG, 6(4), 2010.
[10]
I. Dinur and K. Nissim. Revealing Information while Preserving Privacy. In PODS. ACM, 2003.
[11]
J. C. Duchi, M. Jordan, and M. J. Wainwright. Local Privacy and Statistical Minimax Rates. In FOCS. IEEE, 2013.
[12]
J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Privacy Aware Learning. JACM, 61(6), 2014.
[13]
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our Data, Ourselves: Privacy via Distributed Noise Generation. In EUROCRYPT. Springer, 2006.
[14]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis. In TCC. Springer, 2006.
[15]
C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential Privacy under Continual Observation. In STOC. ACM, 2010.
[16]
C. Dwork, M. Naor, O. Reingold, and G. Rothblum. Pure Differential Privacy for Rectangle Queries via Private Partitions. In ASIACRYPT, 2015.
[17]
C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Theoretical Computer Science, 9(3--4), 2013.
[18]
C. Dwork, G. N. Rothblum, and S. Vadhan. Boosting and Differential Privacy. In FOCS. IEEE, 2010.
[19]
M. M. Fard, Y. Grinberg, J. Pineau, and D. Precup. Compressed Least-Squares Regression on Sparse Spaces. In AAAI, 2012.
[20]
Y. Gordon. On Milman's Inequality and Random Subspaces which Escape through a Mesh in R. Springer, 1988.
[21]
N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review, 53(2), 2011.
[22]
E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarithmic Regret Algorithms for Online Convex Optimization. In COLT. Springer, 2006.
[23]
J. Hsu, Z. Huang, A. Roth, T. Roughgarden, and Z. S. Wu. Private Matchings and Allocations. In STOC. ACM, 2014.
[24]
P. Jain, P. Kothari, and A. Thakurta. Differentially Private Online Learning. In COLT, 2012.
[25]
P. Jain and A. Thakurta. Differentially Private Learning with Kernels. In ICML, 2013.
[26]
P. Jain and A. G. Thakurta. (Near) Dimension Independent Risk bounds for Differentially Private Learning. In ICML, 2014.
[27]
A. Kabáan. New bounds on compressive linear least squares regression. In AISTATS, 2014.
[28]
S. Kasiviswanathan and H. Jin. Efficient Private Empirical Risk Minimization for High-dimensional Learning. In ICML, 2016.
[29]
S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In FOCS, pages 531--540. IEEE, 2008.
[30]
S. P. Kasiviswanathan, M. Rudelson, and A. Smith. The Power of Linear Reconstruction Attacks. In SODA. SIAM, 2013.
[31]
S. P. Kasiviswanathan and A. Smith. On the 'Semantics' of Differential Privacy: A Bayesian Formulation. Journal of Privacy and Confidentiality, 6(1), 2014.
[32]
K. Kenthapadi, A. Korolova, I. Mironov, and N. Mishra. Privacy via the Johnson-Lindenstrauss Transform. Journal of Privacy and Confidentiality, 5(1), 2013.
[33]
D. Kifer, A. Smith, and A. Thakurta. Private Convex Empirical Risk Minimization and High-dimensional Regression. In COLT, 2012.
[34]
M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes, volume 23. Springer Science & Business Media, 2013.
[35]
O. Maillard and R. Munos. Compressed Least-squares Regression. In NIPS, 2009.
[36]
N. Mishra and A. Thakurta. (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits. In UAI, 2015.
[37]
J. Nelson and H. L. Nguyên. OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings. In FOCS. IEEE, 2013.
[38]
G. E. Pfander. Sampling Theory, a Renaissance. Springer, 2015.
[39]
B. I. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. Learning in a Large Function Space: Privacy-preserving Mechanisms for SVM Learning. arXiv:0911.5708, 2009.
[40]
S. Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and Trends in Machine Learning, 4(2), 2011.
[41]
S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Learnability, Stability and Uniform Convergence. The Journal of Machine Learning Research, 11, 2010.
[42]
O. Sheffet. Private Approximations of the 2nd-Moment Matrix Using Existing Techniques in Linear Regression. arXiv:1507.00056, 2015.
[43]
S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic Gradient Descent with Differentially Private Updates. In GlobalSIP. IEEE, 2013.
[44]
K. Talwar, A. Thakurta, and L. Zhang. Nearly Optimal Private Lasso. In NIPS, 2015.
[45]
K. Talwar, A. Thakurta, and L. Zhang. Private Empirical Risk Minimization Beyond the Worst Case: The Effect of the Constraint Set Geometry. arXiv:1411.5417, 2015.
[46]
A. G. Thakurta and A. Smith. Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso. In COLT, 2013.
[47]
A. G. Thakurta and A. Smith. (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings. In NIPS, 2013.
[48]
R. Tibshirani. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 1996.
[49]
J. Ullman. Private Multiplicative Weights beyond Linear Queries. In PODS. ACM, 2015.
[50]
J. Upadhyay. Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy and Compressed Sensing. arXiv:1410.2470, 2014.
[51]
V. Vapnik. The Nature of Statistical Learning Theory. Springer Science & Business Media, 2013.
[52]
R. Vershynin. Estimation in High Dimensions: A Geometric Perspective. arXiv:1405.5103, 2014.
[53]
O. Williams and F. McSherry. Probabilistic Inference and Differential Privacy. In NIPS, 2010.
[54]
WolframAlpha. https://www.wolframalpha.com/examples/EquationSolving.html, 2016.
[55]
D. P. Woodruff. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science, 10, 2014.
[56]
S. Zhou, K. Ligett, and L. Wasserman. Differential Privacy with Compression. In ISIT. IEEE, 2009.

Cited By

View all

Index Terms

  1. Private Incremental Regression

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
    May 2017
    458 pages
    ISBN:9781450341981
    DOI:10.1145/3034786
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 May 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. differential privacy
    2. empirical risk minimization
    3. incremental machine learning
    4. linear regression

    Qualifiers

    • Research-article

    Funding Sources

    • NSF Grant

    Conference

    SIGMOD/PODS'17
    Sponsor:

    Acceptance Rates

    PODS '17 Paper Acceptance Rate 29 of 101 submissions, 29%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media