skip to main content
10.1145/2939672.2939769acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Annealed Sparsity via Adaptive and Dynamic Shrinking

Published: 13 August 2016 Publication History

Abstract

Sparse learning has received tremendous amount of interest in high-dimensional data analysis due to its model interpretability and the low-computational cost. Among the various techniques, adaptive l1-regularization is an effective framework to improve the convergence behaviour of the LASSO, by using varying strength of regularization across different features. In the meantime, the adaptive structure makes it very powerful in modelling grouped sparsity patterns as well, being particularly useful in high-dimensional multi-task problems. However, choosing an appropriate, global regularization weight is still an open problem. In this paper, inspired by the annealing technique in material science, we propose to achieve "annealed sparsity" by designing a dynamic shrinking scheme that simultaneously optimizes the regularization weights and model coefficients in sparse (multi-task) learning. The dynamic structures of our algorithm are twofold. Feature-wise (spatially), the regularization weights are updated interactively with model coefficients, allowing us to improve the global regularization structure. Iteration-wise (temporally), such interaction is coupled with gradually boosted l1-regularization by adjusting an equality norm-constraint, achieving an annealing effect to further improve model selection. This renders interesting shrinking behaviour in the whole solution path. Our method competes favorably with state-of-the-art methods in sparse (multi-task) learning. We also apply it in expression quantitative trait loci analysis (eQTL), which gives useful biological insights in human cancer (melanoma) study.

Supplementary Material

MP4 File (kdd2016_zhang_annealed_sparsity_01-acm.mp4)

References

[1]
I. Amm, T. Sommer, and D. H. Wolf. Protein quality control and elimination of protein waste: The role of the ubiquitin--proteasome system. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research, 1843(1):182--196, 2014.
[2]
A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243--272, 2008.
[3]
S. Ben-david and R. Schuller. Exploiting task relatedness for multiple task learning. In Proceedings of the 16th Annual Conference on Learning Theory, pages 567--580, 2003.
[4]
E. J. Candes, M. B. Wakin, and S. Boyd. Enhancing sparsity by reweighted l1 minimization. Journal of Fourier Analysis and Applications, special issue on sparsity, 14(5):877--905, 2008.
[5]
S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43(1):129--159, 2001.
[6]
D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289--1304, 2006.
[7]
D. Dornan, H. Shimizu, A. Mah, T. Dudhela, M. Eby, K. O'Rourke, S. Seshagiri, and V. M. Dixit. Atm engages autodegradation of the e3 ubiquitin ligase cop1 after dna damage. Science, 313(5790):1122--1126, 2006.
[8]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407--499, 2004.
[9]
M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 37(4):3736 -- 3745, 2006.
[10]
J. Fan, Y. Feng, and Y. Wu. Network exploration via the adaptive lasso and scad penalties. The Annals of Applied Statistics, 3(2):521--541, 2009.
[11]
J. Fan and R. Li. Variable selection via nonconcave penalized like- lihood and its oracle properties. Journal of the American Statistical Association, 96:1348--1360, 2001.
[12]
M. Figueiredo, R. Nowak, and S. Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal on Selected Topics in Signal Processing, 1(4):586 --597, 2007.
[13]
J. Huang, S. Ma, and C.-H. Zhang. Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica, pages 1603--1618, 2008.
[14]
S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. An interior-point method for large-scale l1-regularized least squares. IEEE Transactions on Signal Processing, 1(4):606--617, 2007.
[15]
S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1993.
[16]
M. Kowalski. Sparse regression using mixed norms. Applied and Computational Harmonic Analysis, 27(3):303--324, 2009.
[17]
S. Lee, J. Zhu, and E. P. Xing. Adaptive multi-task lasso: with application to eqtl detection. In Advances in Neural Information Processing Systems, pages 1306--1314, 2010.
[18]
J. Liu, S. Ji, and J. Ye. Slep: Sparse learning with efficient projections. Arizona State University, 2009.
[19]
A. C. Lozano and G.'Swirszcz. Multi-level lasso for sparse multi-task regression. In Proceedings of the International Conference on Machine Learning, 2012.
[20]
N. Mailand, J. Falck, C. Lukas, R. G. Syljuåsen, M. Welcker, J. Bartek, and J. Lukas. Rapid destruction of human cdc25a in response to dna damage. Science, 288(5470):1425--1429, 2000.
[21]
J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Supervised dictionary learning. In Advances in Neural Information Processing Systems, 2009.
[22]
N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436--1462, 2006.
[23]
G. Obozinski, M. J. Wainwright, and M. I. Jordan. High-dimensional union support recovery in multivariate regression. In Neural Information Processing Systems, pages 1217--1224, 2008.
[24]
S. Puniyani, S. Kim, and E. P. Xing. Multi-population gwa mapping via multi-task regularized regression. Bioinformatics, 26(12):208--216, 2010.
[25]
S. Raman and V. Roth. Sparse point estimation for bayesian regression via simulated annealing. Lecture Notes in Computer Science, 7476:317--326, 2012.
[26]
B. M. Sefton and S. Shenolikar. Overview of protein phosphorylation. Current Protocols in Protein Science, pages 13--1, 2001.
[27]
S. K. Shevade and S. S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246--2253, 2003.
[28]
A. Subramanian et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545--15550, 2005.
[29]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267--288, 1996.
[30]
J. Tropp, A. Gilbert, and M. Strauss. Algorithms for simultaneous sparse approximation, part ii: Convex relaxation. Signal Processing, 86:572--588, 2006.
[31]
L. H. V. Vlack. Elements of Materials Science and Engineering. Addison-Wesley, 1985.
[32]
C. Wei et al. A global map of p53 transcription-binding sites in the human genome. Cell, 124(1):207--219, 2006.
[33]
J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
[34]
M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68(1):49--67, 2006.
[35]
J. Zhang, W. Cheng, Z. Wang, Z. Zhang, W. Lu, G. Lu, and J. Feng. Pattern classification of large-scale functional brain networks: Identification of informative neuroimaging markers for epilepsy. PLoS ONE, 7(5):e36733, 2012.
[36]
K. Zhang, J. Gray, and B. Parvin. Sparse multitask regression for identifying common mechanism of response to therapeutic targets. Bioinformatics, 26(12):97 -- 105, 2010.
[37]
P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541--2563, 2007.
[38]
H. Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418--1429, 2006.
[39]
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301--320, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive lasso
  2. annealing
  3. compressive sensing
  4. dynamic shrinking
  5. feature selection
  6. lasso
  7. multitask learning
  8. regularization path
  9. sparse regression

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media