skip to main content
survey

A Survey on Bayesian Nonparametric Learning

Published: 25 January 2019 Publication History

Abstract

Bayesian (machine) learning has been playing a significant role in machine learning for a long time due to its particular ability to embrace uncertainty, encode prior knowledge, and endow interpretability. On the back of Bayesian learning’s great success, Bayesian nonparametric learning (BNL) has emerged as a force for further advances in this field due to its greater modelling flexibility and representation power. Instead of playing with the fixed-dimensional probabilistic distributions of Bayesian learning, BNL creates a new “game” with infinite-dimensional stochastic processes. BNL has long been recognised as a research subject in statistics, and, to date, several state-of-the-art pilot studies have demonstrated that BNL has a great deal of potential to solve real-world machine-learning tasks. However, despite these promising results, BNL has not created a huge wave in the machine-learning community. Esotericism may account for this. The books and surveys on BNL written by statisticians are overcomplicated and filled with tedious theories and proofs. Each is certainly meaningful but may scare away new researchers, especially those with computer science backgrounds. Hence, the aim of this article is to provide a plain-spoken, yet comprehensive, theoretical survey of BNL in terms that researchers in the machine-learning community can understand. It is hoped this survey will serve as a starting point for understanding and exploiting the benefits of BNL in our current scholarly endeavours. To achieve this goal, we have collated the extant studies in this field and aligned them with the steps of a standard BNL procedure—from selecting the appropriate stochastic processes through manipulation to executing the model inference algorithms. At each step, past efforts have been thoroughly summarised and discussed. In addition, we have reviewed the common methods for implementing BNL in various machine-learning tasks along with its diverse applications in the real world as examples to motivate future studies.

References

[1]
Amr Ahmed, Linagjie Hong, and Alexander J. Smola. 2013. Nested Chinese restaurant franchise processes: Applications to user tracking and document modeling. In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML’13). JMLR.org, III--1426--III--1434.
[2]
David J. Aldous. 1985. Exchangeability and Related Topics. Springer.
[3]
Christophe Andrieu, Nando de Freitas, Arnaud Doucet, and Michael I. Jordan. 2003. An introduction to MCMC for machine learning. Mach. Learn. 50, 1 (2003), 5--43.
[4]
Cédric Archambeau, Balaji Lakshminarayanan, and Guillaume Bouchard. 2015. Latent IBP compound Dirichlet allocation. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 321--333.
[5]
Behnam Babagholami M., Seyed M. Roostaiyan, Ali Zarghami, and Mahdieh S. Baghshah. 2014. Multi-modal distance metric learning: A Bayesian non-parametric approach. In Proceedings of the 13th European Conference on Computer Vision Workshops (ECCV’14). 63--77.
[6]
David M. Blei, Perry R. Cook, and Matthew Hoffman. 2010. Bayesian nonparametric matrix factorization for recorded music. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 439--446.
[7]
David M. Blei and Peter I. Frazier. 2011. Distance dependent Chinese restaurant processes. J. Mach. Learn. Res. 12, 8 (2011), 2461--2488.
[8]
David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. 2010. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57, 2 (2010), 7.
[9]
David M. Blei and Michael I. Jordan. 2006. Variational inference for Dirichlet process mixtures. Bayes. Anal. 1, 1 (2006), 121--143.
[10]
David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. 2017. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 112, 518 (2017), 859--877.
[11]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 1 (2003), 993--1022.
[12]
Phil Blunsom and Trevor Cohn. 2011. A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (ACL’11). 865--874.
[13]
Tamara Broderick, Michael I. Jordan, and Jim Pitman. 2012. Beta processes, stick-breaking and power laws. Bayes. Anal. 7, 2 (2012), 439--476.
[14]
Tamara Broderick, Lester Mackey, John Paisley, and Michael I. Jordan. 2015. Combinatorial clustering and the beta negative binomial process. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 290--306.
[15]
Michael Bryant and Erik B. Sudderth. 2012. Truly nonparametric online variational inference for hierarchical Dirichlet processes. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS’12). 2699--2707.
[16]
Trevor Campbell, Julian Straub, John W. Fisher III, and Jonathan P. How. 2015. Streaming, distributed variational inference for Bayesian nonparametrics. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS’15). 280--288.
[17]
Kevin R. Canini and Thomas L. Griffiths. 2011. A nonparametric Bayesian model of multi-level category learning. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI’11). 307--312.
[18]
Kevin R. Canini, Mikhail M. Shashkov, and Thomas L. Griffiths. 2010. Modeling transfer learning in human categorization with the hierarchical Dirichlet process. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 151--158.
[19]
Lawrence Carin, David M. Blei, and John W. Paisley. 2011. Variational inference for stick-breaking beta process priors. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 889--896.
[20]
Francois Caron, Manuel Davy, and Arnaud Doucet. 2007. Generalized polya urn for time-varying Dirichlet process mixtures. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07). 33--40.
[21]
FranÇois Caron, Manuel Davy, Arnaud Doucet, Emmanuel Duflos, and Philippe Vanheeghe. 2008. Bayesian inference for linear dynamic models with Dirichlet process mixtures. IEEE Trans. Sign. Process. 56, 1 (2008), 71--84.
[22]
FranÇois Caron and Emily B. Fox. 2017. Sparse graphs using exchangeable random measures. J. Roy. Stat. Soc. B 79, 5 (2017), 1295--1366.
[23]
Jason Chang and John W. Fisher III. 2013. Parallel sampling of DP mixture models using sub-cluster splits. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS’13). 620--628.
[24]
Sotirios P. Chatzis, Dimitrios Korkinof, and Yiannis Demiris. 2012. A nonparametric Bayesian approach toward robot learning by demonstration. Robot. Auton. Syst. 60, 6 (2012), 789--802.
[25]
Sotirios P. Chatzis and Gabriel Tsechpenakis. 2010. The infinite hidden markov random field model. IEEE Trans. Neur. Netw. 21, 6 (2010), 1004--1014.
[26]
Bo Chen, Gungor Polatkan, Guillermo Sapiro, Lawrence Carin, and David B. Dunson. 2011. The hierarchical beta process for convolutional factor analysis and deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 361--368.
[27]
Changyou Chen, Wray Buntine, Nan Ding, Lexing Xie, and Lan Du. 2015. Differential topic models. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 230--242.
[28]
Changyou Chen, Nan Ding, and Wray L. Buntine. 2012. Dependent hierarchical normalized random measures for dynamic topic modeling. In Proceedings of the 29th International Conference on Machine Learning (ICML’12).
[29]
Changyou Chen, Vinayak Rao, Wray Buntine, and Yee W. Teh. 2013. Dependent normalized random measures. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 969--977.
[30]
Yi Chen, X L Wang, Xin Xiang, Buzhou Tang, and Junzhao Bu. 2015. Network structure exploration via Bayesian nonparametric models. J. Stat. Mech.: Theory Exp. 2015, 10 (2015), P10004.
[31]
Jen-Tzung Chien. 2018. Bayesian nonparametric learning for hierarchical and sparse topics. IEEE/ACM Trans. Aud. Speech. Lang. Process. 26, 2 (2018), 422--435.
[32]
Jaedeug Choi and Kee-Eung Kim. 2012. Nonparametric Bayesian inverse reinforcement learning for multiple reward functions. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS’12). 305--313.
[33]
Jaedeug Choi and Kee-Eung Kim. 2013. Bayesian nonparametric feature construction for inverse reinforcement learning. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13). 1287--1293.
[34]
Yeonseung Chung and David B. Dunson. 2009. The local Dirichlet process. Ann. Inst. Stat. Math. 63, 1 (2009), 59--80.
[35]
Adelino R. Ferreira da Silva. 2007. A Dirichlet process mixture model for brain MRI tissue classification. Med. Image Anal. 11, 2 (2007), 169--182.
[36]
Andrew M. Dai and Amos J. Storkey. 2015. The supervised hierarchical Dirichlet process. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 243--255.
[37]
Patrick Dallaire, Camille Besse, Stephane Ross, and Brahim Chaib-draa. 2009. Bayesian reinforcement learning in continuous POMDPs with Gaussian processes. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS’09). 2604--2609.
[38]
Patrick Dallaire, Philippe Giguère, Daniel Émond, and Brahim Chaib-draa. 2014. Autonomous tactile perception: A combined improved sensing and Bayesian nonparametric approach. Robot. Autonom. Syst. 62, 4 (2014), 422--435.
[39]
P. Damlen, John Wakefield, and Stephen Walker. 1999. Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables. J. Roy. Stat. Soc. Ser. B 61, 2 (1999), 331--344.
[40]
Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters. 2008. Model-based reinforcement learning with continuous states and actions. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN’08). 19--24.
[41]
John DeNero, Alexandre Bouchard-Côté, and Dan Klein. 2008. Sampling alignment structure under a Bayesian translation model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 314--323.
[42]
Nan Ding, Rongjing Xiang, Ian Molloy, and Ninghui Li. 2010. Nonparametric Bayesian matrix factorization by Power-EP. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10). 169--176.
[43]
Kjell Doksum. 1974. Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab. 2, 2 (1974), 183--201.
[44]
Finale Doshi, Kurt Miller, Jurgen V. Gael, and Yee W. Teh. 2009. Variational inference for the Indian buffet process. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS’09). 137--144.
[45]
Finale Doshi-velez. 2009. The infinite partially observable Markov decision process. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS’09). 477--485.
[46]
Finale Doshi-Velez, David Pfau, Frank Wood, and Nicholas Roy. 2015. Bayesian nonparametric methods for partially-observable reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 394--407.
[47]
Finale Doshi-Velez, David Wingate, Nicholas Roy, and Joshua B. Tenenbaum. 2010. Nonparametric Bayesian policy priors for reinforcement learning. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS’10). 532--540.
[48]
Kumar Dubey, Sinead Williamson, and Eric P. Xing. 2014. Parallel Markov chain Monte Carlo for Pitman-Yor mixture models. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI’14). 142--151.
[49]
David B. Dunson. 2010. Nonparametric Bayes applications to biostatistics. In Bayesian Nonparametrics. Cambridge University Press. 223--273.
[50]
David B. Dunson and Ju-Hyun Park. 2008. Kernel stick-breaking processes. Biometrika 95, 2 (2008), 307--323.
[51]
Clément Elvira, Pierre Chainais, and Nicolas Dobigeon. 2017. Bayesian nonparametric principal component analysis. arXiv preprint arXiv:1709.05667 (2017).
[52]
Paul Embrechts, Thomas Liniger, and Lu Lin. 2011. Multivariate Hawkes processes: An application to financial data. J. Appl. Probab. 48A (2011), 367--378.
[53]
Ali Faisal, Jussi Gillberg, Gayle Leen, and Jaakko Peltonen. 2013. Transfer learning using a nonparametric sparse topic model. Neurocomputing 112 (2013), 124--137.
[54]
Xuhui Fan, Longbing Cao, and Richard D.Y. Xu. 2015. Dynamic infinite mixed-membership stochastic blockmodel. IEEE Trans. Neur. Netw. Learn. Syst. 26, 9 (2015), 2072--2085.
[55]
Stefano Favaro, Antonio Lijoi, and Igor Prünster. 2012. A new estimator of the discovery probability. Biometrics 68, 4 (2012), 1188--1196.
[56]
Paul Fearnhead. 2004. Particle filters for mixture models with an unknown number of components. Stat. Comput. 14, 1 (2004), 11--21.
[57]
Thomas S. Ferguson. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 2 (1973), 209--230.
[58]
Thomas S. Ferguson, Eswar G. Phadia, and Ram C. Tiwari. 1992. Bayesian nonparametric inference. Lect. Not Monogr. Ser. 17 (1992), 127--150.
[59]
Nicholas J. Foti, Joseph D. Futoma, Daniel N. Rockmore, and Sinead Williamson. 2013. A unifying representation for a class of dependent random measures. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS’13). 20--28.
[60]
Nicholas J. Foti and Sinead A. Williamson. 2015. A survey of non-exchangeable priors for bayesian nonparametric models. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 359--371.
[61]
Emily Fox, Erik B. Sudderth, Michael I. Jordan, and Alan S. Willsky. 2011. Bayesian nonparametric inference of switching dynamic linear models. IEEE Trans. Sign. Process. 59, 4 (2011), 1569--1585.
[62]
Emily B. Fox. 2009. Bayesian Nonparametric Learning of Complex Dynamical Phenomena. Ph.D. Dissertation. Massachusetts Institute of Technology.
[63]
Emily B. Fox, Michael C. Hughes, Erik B. Sudderth, and Michael I. Jordan. 2014. Joint modeling of multiple time series via the beta process with application to motion capture segmentation. Ann. Appl. Stat. 8, 3 (2014), 1281--1313.
[64]
Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, and Alan S. Willsky. 2011. A sticky HDP-HMM with application to speaker diarization. Ann. Appl. Stat. 5, 2A (2011), 1020--1056.
[65]
Jurgen V. Gael, Yee W. Teh, and Zoubin Ghahramani. 2008. The infinite factorial hidden Markov model. In Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS’08). 1697--1704.
[66]
Jurgen V. Gael, Andreas Vlachos, and Zoubin Ghahramani. 2009. The infinite HMM for unsupervised PoS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’09). 678--687.
[67]
Zekai Gao, Yangqiu Song, Shixia Liu, Haixun Wang, Hao Wei, Yang Chen, and Weiwei Cui. 2011. Tracking and connecting topics via incremental hierarchical Dirichlet processes. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM’11). 1056--1061.
[68]
Hong Ge, Yutian Chen, Moquan Wan, and Zoubin Ghahramani. 2015. Distributed inference for Dirichlet process mixture models. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 2276--2284.
[69]
Alan E. Gelfand, Athanasios Kottas, and Steven N. MacEachern. 2005. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 100, 471 (2005), 1021--1035.
[70]
Samuel J. Gershman and David M. Blei. 2012. A tutorial on Bayesian nonparametric models. J. Math. Psychol. 56, 1 (2012), 1--12.
[71]
Samuel J. Gershman, Peter I. Frazier, and David M. Blei. 2015. Distance dependent infinite latent feature models. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 334--345.
[72]
Zoubin Ghahramani, Michael I. Jordan, and Ryan P. Adams. 2010. Tree-structured stick breaking for hierarchical data. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS’10). 19--27.
[73]
Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar, et al. 2015. Bayesian reinforcement learning: A survey. Found. Trends Mach. Learn. 8, 5-6 (2015), 359--483.
[74]
Jayanta K. Ghosh and R.V. Ramamoorthi. 2002. Bayesian Nonparametrics. Springer.
[75]
Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. 2009. A Bayesian framework for word segmentation: Exploring the effects of context. Cognition 112, 1 (2009), 21--54.
[76]
Jim E. Griffin, Michalis Kolossiatis, and Mark F.J. Steel. 2013. Comparing distributions by using dependent normalized random-measure mixtures. J. Roy. Stat. Soc. Ser. B 75, 3 (2013), 499--529.
[77]
Thomas L. Griffiths and Zoubin Ghahramani. 2005. Infinite latent feature models and the Indian buffet process. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (NIPS’05). 475--482.
[78]
Thomas L. Griffiths and Zoubin Ghahramani. 2011. The indian buffet process: An introduction and review. J. Mach. Learn. Res. 12, 4 (2011), 1185--1224.
[79]
Sunil K. Gupta, Dinh Phung, and Svetha Venkatesh. 2012. A Bayesian nonparametric joint factor model for learning shared and individual subspaces from multiple data sources. In Proceedings of the 12th International Conference on Data Mining (SDM’12). 200--211.
[80]
Sunil K. Gupta, Dinh Q. Phung, and Svetha Venkatesh. 2012. A slice sampler for restricted hierarchical beta process with applications to shared subspace learning. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI’12). 316--325.
[81]
Tom S. F. Haines and Tao Xiang. 2014. Background subtraction with Dirichlet process mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 36, 4 (2014), 670--683.
[82]
Lauren A. Hannah, David M. Blei, and Warren B. Powell. 2011. Dirichlet process mixtures of generalized linear models. J. Mach. Learn. Res. 12, 6 (2011), 1923--1953.
[83]
Li He, Hairong Qi, and Russell Zaretzki. 2013. Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 345--352.
[84]
Creighton Heaukulani, David A. Knowles, and Zoubin Ghahramani. 2014. Beta diffusion trees. In Proceedings of the 31th International Conference on Machine Learning (ICML’14). 1809--1817.
[85]
Jennifer L. Hill. 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20, 1 (2011), 217--240.
[86]
Geoffrey E. Hinton, Simon Osindero, and Yee W. Teh. 2006. A fast learning algorithm for deep belief nets. Neur. Comput. 18, 7 (2006), 1527--1554.
[87]
Nils L. Hjort. 1990. Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Stat. 18, 3 (1990), 1259--1294.
[88]
Nils L. Hjort, Chris Holmes, Peter Müller, and Stephen G. Walker. 2010. Bayesian Nonparametrics. Vol. 28. Cambridge University Press.
[89]
Matthew D. Hoffman, David M. Blei, and Perry R. Cook. 2008. Content-based musical similarity computation using the hierarchical Dirichlet process. In Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR’08). 349--354.
[90]
Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. J. Mach. Learn. Res. 14, 1 (2013), 1303--1347.
[91]
Yuening Hu, Ke Zhai, Vladimir Eidelman, and Jordan L. Boyd-Graber. 2014. Polylingual tree-based topic models for translation domain adaptation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 1166--1176.
[92]
John P. Huelsenbeck, Sonia Jain, Simon W. D. Frost, and Sergei L. Kosakovsky Pond. 2006. A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proc. Natl. Acad. Sci. U.S.A. 103, 16 (2006), 6263--6268.
[93]
Tomoharu Iwata, James R. Lloyd, and Zoubin Ghahramani. 2016. Unsupervised many-to-many object matching for relational data. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2016), 607--617.
[94]
Saad Jbabdi, Mark Woolrich, and Timothy E.J. Behrens. 2009. Multiple-subjects connectivity-based parcellation using hierarchical Dirichlet process mixture models. NeuroImage 44, 2 (2009), 373--384.
[95]
Yun Jiang and Ashutosh Saxena. 2013. Infinite latent conditional random fields for modeling environments through humans. In Proceedings of Robotics: Science and Systems IX. Berlin, Germany.
[96]
Michael I. Jordan. 2010. Bayesian nonparametric learning: Expressive priors for intelligent systems. Heuristics, Probability and Causality: A Tribute to Judea Pearl 11 (2010), 167--185.
[97]
Maria Kalli, Jim E. Griffin, and Stephen G. Walker. 2011. Slice sampling mixture models. Stat. Comput. 21, 1 (2011), 93--105.
[98]
Jeon-Hyung Kang, Jun Ma, and Yan Liu. 2012. Transfer topic modeling with ease and scalability. In Proceedings of the 12th SIAM International Conference on Data Mining (SDM’12). 564--575.
[99]
Charles Kemp, Joshua B. Tenenbaum, Thomas L. Griffiths, Takeshi Yamada, and Naonori Ueda. 2006. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence—Volume 1 (AAAI’06). 381--388.
[100]
John F. C. Kingman. 1967. Completely random measures. Pac. J. Math. 21, 1 (1967), 59--78.
[101]
John F. C. Kingman. 1982. The coalescent. Stochast. Process. Appl. 13, 3 (1982), 235--248.
[102]
John F. C. Kingman. 1992. Poisson Processes. Vol. 3. Oxford University Press.
[103]
David A. Knowles and Zoubin Ghahramani. 2015. Pitman Yor diffusion trees for Bayesian hierarchical clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 271--289.
[104]
Nishanth Koganti, Tomoya Tamei, Kazushi Ikeda, and Tomohiro Shibata. 2017. Bayesian nonparametric learning of cloth models for real-time state estimation. IEEE Trans. Robot. 33, 4 (2017), 916--931.
[105]
Kenichi Kurihara, Max Welling, and Yee W. Teh. 2007. Collapsed variational Dirichlet process mixture models. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2796--2801.
[106]
Kenichi Kurihara, Max Welling, and Nikos A. Vlassis. 2006. Accelerated variational Dirichlet process mixtures. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS’06). 761--768.
[107]
Sergey Levine, Zoran Popovic, and Vladlen Koltun. 2011. Nonlinear inverse reinforcement learning with Gaussian processes. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS’11), J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). 19--27.
[108]
Dawen Liang, Matthew D. Hoffman, and Daniel P.W. Ellis. 2013. Beta process sparse nonnegative matrix factorization for music. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR’13). 375--380.
[109]
Antonio Lijoi, Ramsés H. Mena, and Igor Prünster. 2007. A Bayesian nonparametric method for prediction in EST analysis. BMC Bioinf. 8, 1 (2007), 1--10.
[110]
Kar W. Lim, Wray Buntine, Changyou Chen, and Lan Du. 2016. Nonparametric Bayesian topic modelling with the hierarchical Pitman-Yor processes. Int. J. Approx. Reason. 78, C (2016), 172--191.
[111]
Dahua Lin. 2013. Online learning of nonparametric mixture models via sequential variational approximation. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS’13). 395--403.
[112]
Dahua Lin, Eric Grimson, and John W. Fisher III. 2010. Construction of dependent Dirichlet processes based on Poisson processes. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS’10). 1396--1404.
[113]
Dan Lovell, Jonathan Malmaud, Ryan P. Adams, and Vikash K. Mansinghka. 2013. ClusterCluster: Parallel Markov chain Monte Carlo for Dirichlet process mixtures. arXiv preprint arXiv:1304.2302 (2013).
[114]
Steven N. MacEachern. 1999. Dependent nonparametric processes. In Proceedings of the Section on Bayesian Statistical Science. American Statistical Association, 50--55.
[115]
Steven N. MacEachern. 2000. Dependent Dirichlet Processes. Technical Report. Department of Statistics, The Ohio State University.
[116]
Steven N. MacEachern, Merlise Clyde, and Jun S. Liu. 1999. Sequential importance sampling for nonparametric Bayes models: The next generation. Can. J. Stat. 27, 2 (1999), 251--267.
[117]
M. Mahmud. 2010. Constructing states for reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 727--734.
[118]
Bernard Michini and Jonathan P. How. 2012. Bayesian nonparametric inverse reinforcement learning. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML’12). Bristol, UK, 148--163.
[119]
Bernard Michini, Thomas J. Walsh, Ali-Akbar Agha-Mohammadi, and Jonathan P. How. 2015. Bayesian nonparametric reward learning from demonstration. IEEE Trans. Robot. 31, 2 (2015), 369--386.
[120]
Kurt Miller, Michael I. Jordan, and Thomas L. Griffiths. 2009. Nonparametric latent feature models for link prediction. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS’09). 1276--1284.
[121]
Thomas Minka. 2004. Power EP. Technical Report. Microsoft Research, Cambridge.
[122]
Morten Mørup, Mikkel N. Schmidt, and Lars K. Hansen. 2011. Infinite multiple membership relational modeling for complex networks. In Proceedings of the 21st IEEE International Workshop on Machine Learning for Signal Processing (MLSP’11). 1--6.
[123]
Peter Müller, Fernando A. Quintana, Alejandro Jara, and Tim Hanson. 2015. Bayesian Nonparametric Data Analysis. Springer.
[124]
Masahiro Nakano, Yasunori Ohishi, Hirokazu Kameoka, Ryo Mukai, and Kunio Kashino. 2012. Bayesian nonparametric music parser. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’12). 461--464.
[125]
Radford M. Neal. 2003. Density modeling and clustering using Dirichlet diffusion trees. Bayes. Stat. 7 (2003), 619--629.
[126]
Radford M. Neal. 2003. Slice sampling. Ann. Stat. 31, 3 (2003), 705--767.
[127]
Willie Neiswanger, Chong Wang, and Eric Xing. 2014. Embarrassingly parallel variational inference in nonconjugate models. In Workshop on Advanced Variational Inference, Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPSW’14). 1--18.
[128]
Peter Orbanz and Yee Whye Teh. 2010. Bayesian nonparametric models. In Encyclopedia of Machine Learning. Springer US, Boston, MA, 81--89.
[129]
John Paisley, Chong Wang, David M. Blei, and Michael I. Jordan. 2015. Nested hierarchical Dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 256--270.
[130]
Konstantina Palla, David A. Knowles, and Zoubin Ghahramani. 2015. Relational learning and network modelling using infinite latent attribute models. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 462--474.
[131]
Sinno J. Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345--1359.
[132]
Christian Plagemann, Kristian Kersting, Patrick Pfaff, and Wolfram Burgard. 2007. Gaussian beam processes: A nonparametric Bayesian measurement model for range finders. In Proceedings of the Robotics: Science and Systems (RSS'07), Atlanta, Georgia.
[133]
S. C. Williams R. Daniel Mauldin, William D. Sudderth. 1992. Polya trees and random distributions. Ann. Stat. 20, 3 (1992), 1203--1221.
[134]
Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257--286.
[135]
Natraj Raman and S.J. Maybank. 2016. Non-parametric hidden conditional random fields for action classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’16). 3256--3263.
[136]
Pravesh Ranchod, Benjamin Rosman, and George Konidaris. 2015. Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15). 471--477.
[137]
Carl E. Rasmussen. 1999. The infinite Gaussian mixture model. In Proceedings of the 13th Annual Conference on Neural Information Processing Systems (NIPS’99). 554--560.
[138]
Lu Ren, David Dunson, Scott Lindroth, and Lawrence Carin. 2010. Dynamic nonparametric Bayesian models for analysis of music. J. Am. Stat. Assoc. 105, 490 (2010), 458--472.
[139]
Lu Ren, David B. Dunson, and Lawrence Carin. 2008. The dynamic hierarchical Dirichlet process. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 824--831.
[140]
Lu Ren, Yingjian Wang, Lawrence Carin, and David B. Dunson. 2011. The kernel beta process. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS’11). 963--971.
[141]
Abel Rodriguez, David B. Dunson, and Alan E. Gelfand. 2008. The nested Dirichlet process. J. Am. Stat. Assoc. 103, 483 (2008), 1131--1154.
[142]
Daniel M. Roy and Leslie Pack Kaelbling. 2007. Efficient Bayesian task-level transfer learning. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2599--2604.
[143]
Daniel M. Roy and Yee W. Teh. 2008. The mondrian process. In Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS’08). 1377--1384.
[144]
Jason Roy, Kirsten J. Lum, Bret Zeldow, Jordan Dworkin, and Vincent Lo Re III, Michael J. Daniels. 2017. Bayesian nonparametric generative models for causal inference with missing at random covariates. Biometrics.
[145]
Anirban Roychowdhury and Brian Kulis. 2015. Gamma processes, stick-breaking, and variational inference. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS’15). 800--808.
[146]
Ruslan Salakhutdinov, Joshua B. Tenenbaum, and Antonio Torralba. 2011. One-shot learning with a hierarchical nonparametric Bayesian model. In Workshop on Unsupervised and Transfer Learning—Proceedings of the 28th International Conference on Machine Learning (ICMLW’11). 195--206.
[147]
Issei Sato, Kenichi Kurihara, and Hiroshi Nakagawa. 2012. Practical collapsed variational bayes inference for hierarchical dirichlet process. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 105--113.
[148]
Ken-iti Sato. 1999. Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press.
[149]
Mikkel N. Schmidt and Morten Morup. 2013. Nonparametric Bayesian modeling of complex networks: An introduction. IEEE Sign. Process. Mag. 30, 3 (2013), 110--128.
[150]
Peter Schulam and Suchi Saria. 2017. Reliable decision support using counterfactual models. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS’17). 1697--1708.
[151]
Matthias Seeger. 2004. Gaussian processes for machine learning. Int. J. Neur. Syst. 14, 02 (2004), 69--106.
[152]
Jayaram Sethuraman. 1994. A constructive definition of Dirichlet priors. Stat. Sinica 4, 2 (1994), 639--650.
[153]
Babak Shahbaba and Radford Neal. 2009. Nonlinear models using Dirichlet process mixtures. J. Mach. Learn. Res. 10, 8 (2009), 1829--1850.
[154]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354.
[155]
Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2009. Asynchronous distributed learning of topic models. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS’09). 81--88.
[156]
Nitish Srivastava and Ruslan R. Salakhutdinov. 2013. Discriminative transfer learning with tree-based priors. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS’13). 2094--2102.
[157]
Jacob Steinhardt and Zoubin Ghahramani. 2012. Flexible martingale priors for deep hierarchies. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS’12). 1108--1116.
[158]
Amit Surana and Kunal Srivastava. 2014. Bayesian nonparametric inverse reinforcement learning for switched Markov decision processes. In Proceedings of the 13th IEEE International Conference on Machine Learning and Applications (ICMLA’14). 47--54.
[159]
Alex Tank, Nicholas Foti, and Emily Fox. 2015. Streaming variational inference for Bayesian nonparametric mixture models. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS’15). 968--976.
[160]
Martin A. Tanner and Wing H. Wong. 2010. From EM to data augmentation: The emergence of MCMC Bayesian computation in the 1980s. Stat. Sci. 25, 4 (2010), 506--516.
[161]
Yee W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL’06). 985--992.
[162]
Yee W. Teh. 2010. Dirichlet process. In Encyclopedia of Machine Learning. Springer US, Boston, MA, 280--287.
[163]
Yee W. Teh, Charles Blundell, and Lloyd Elliott. 2011. Modelling genetic variations using fragmentation-coagulation processes. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS’11). 819--827.
[164]
Yee W. Teh, Dilan Görür, and Zoubin Ghahramani. 2007. Stick-breaking construction for the Indian buffet process. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS’07). 556--563.
[165]
Yee W. Teh, Hal Daume III, and Daniel M. Roy. 2007. Bayesian agglomerative clustering with coalescents. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems (NIPS’07). 1473--1480.
[166]
Yee W. Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (NIPS’05). 1385--1392.
[167]
Yee W. Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2006. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 476 (2006), 1566--1581.
[168]
Yee W. Teh, Kenichi Kurihara, and Max Welling. 2007. Collapsed variational inference for HDP. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems (NIPS’07). 1481--1488.
[169]
David Temperley. 2007. Music and Probability. MIT Press.
[170]
Romain Thibaux and Michael I. Jordan. 2007. Hierarchical beta processes and the Indian buffet process. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS’07). 564--571.
[171]
Bruce Thompson. 2004. Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications. American Psychological Association.
[172]
David A Van Dyk and Xiao-Li Meng. 2001. The art of data augmentation. J. Comput. Graph. Stat. 10, 1 (2001), 1--50.
[173]
Sara Wade, Silvia Mongelluzzo, and Sonia Petrone. 2011. An enriched conjugate prior for Bayesian nonparametric inference. Bayes. Anal. 6, 3 (2011), 359--385.
[174]
Chong Wang and David M. Blei. 2009. Variational inference for the nested Chinese restaurant process. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS’09). 1990--1998.
[175]
Chong Wang and David M. Blei. 2012. Truncation-free online variational inference for Bayesian nonparametric models. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS’12). 413--421.
[176]
Chong Wang, John W. Paisley, and David M. Blei. 2011. Online variational inference for the hierarchical Dirichlet process. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS’11). 752--760.
[177]
Yingjian Wang and Lawrence Carin. 2012. LéVy measure decompositions for the beta and gamma processes. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML’12). 499--506.
[178]
Christopher K. I. Williams and Carl Edward Rasmussen. 2006. Gaussian Processes for Machine Learning. Vol. 2. MIT Press.
[179]
Sinead Williamson, Avinava Dubey, and Eric Xing. 2013. Parallel Markov chain Monte Carlo for nonparametric mixture models. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 98--106.
[180]
Sinead A. Williamson. 2016. Nonparametric network models for link prediction. J. Mach. Learn. Res. 17, 1 (2016), 7102--7121.
[181]
Alan S. Willsky, Erik B. Sudderth, Michael I. Jordan, and Emily B. Fox. 2009. Nonparametric Bayesian learning of switching linear dynamical systems. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS’09). 457--464.
[182]
Frank Wood and Thomas L. Griffiths. 2006. Particle filtering for nonparametric Bayesian matrix factorization. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS’06). 1513--1520.
[183]
Frank Wood and Yee W. Teh. 2009. A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS’09). 607--614.
[184]
Hongmin Wu, Hongbin Lin, Yisheng Guan, Kensuke Harada, and Juan Rojas. 2017. Robot introspection with bayesian nonparametric vector autoregressive hidden markov models. In Proceedings of the 17th IEEE-RAS International Conference on Humanoid Robotics (Humanoids'17). 882--888.
[185]
Tianbing Xu, Zhongfei Zhang, Philip S. Yu, and Bo Long. 2008. Evolutionary clustering by hierarchical Dirichlet process with hidden Markov state. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). 658--667.
[186]
Zhao Xu, Volker Tresp, Achim Rettinger, and Kristian Kersting. 2010. Social network mining with nonparametric relational models. In Proceedings of the 2nd International Workshop on Advances in Social Network Mining and Analysis (SNAKDD’08). 77--96.
[187]
Zhao Xu, Volker Tresp, Kai Yu, and Hans-Peter Kriegel. 2006. Infinite hidden relational models. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06). 544--551.
[188]
Zhao Xu, Volker Tresp, Kai Yu, Shipeng Yu, and Hans-Peter Kriegel. 2005. Dirichlet enhanced relational learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05). 1004--1011.
[189]
Zenglin Xu, Feng Yan, and Yuan Qi. 2015. Bayesian nonparametric models for multiway data analysis. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 475--487.
[190]
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Y.D. Xu, and Xiangfeng Luo. 2015. Infinite author topic model based on mixed gamma-negative binomial process. In Proceedings of the 15th IEEE International Conference on Data Mining (ICDM’15). Atlantic City, New Jersey, USA, 489--498.
[191]
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Y.D. Xu, and Xiangfeng Luo. 2017. Bayesian nonparametric relational topic model through dependent Gamma processes. IEEE Trans. Knowl. Data Eng. 29, 7 (2017), 1357--1369.
[192]
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Y.D. Xu, and Xiangfeng Luo. 2018. Doubly nonparametric sparse nonnegative matrix factorization based on dependent Indian buffet processes. IEEE Trans. Neur. Netw. Learn. Syst. 29, 5 (2018), 1835--1849.
[193]
Cheng Zhang, Carl Henrik Ek, Xavi Gratal, Florian T. Pokorny, and Hedvig Kjellstrom. 2013. Supervised hierarchical Dirichlet processes with variational inference. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW’13). 254--261.
[194]
Jianwen Zhang, Yangqiu Song, Changshui Zhang, and Shixia Liu. 2010. Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). 1079--1088.
[195]
Jiangchuan Zheng, Siyuan Liu, and Lionel M. Ni. 2014. Effective mobile context pattern discovery via adapted hierarchical Dirichlet processes. In Proceedings of the 15th IEEE International Conference on Mobile Data Management (MDM’14), Vol. 1, 146--155.
[196]
Mingyuan Zhou and Lawrence Carin. 2012. Augment-and-conquer negative binomial processes. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS’12). 2546--2554.
[197]
Mingyuan Zhou and Lawrence Carin. 2015. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 307--320.
[198]
Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, and John W. Paisley. 2009. Non-parametric Bayesian dictionary learning for sparse image representations. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS’09). 2295--2303.
[199]
Mingyuan Zhou, Yulai Cong, and Bo Chen. 2015. Augmentable gamma belief networks. Journal of Machine Learning Research 17, 163 (2016), 1--44.
[200]
Mingyuan Zhou, Lauren Hannah, David B. Dunson, and Lawrence Carin. 2012. Beta-negative binomial process and poisson gactor snalysis. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS’12). 1462--1471.
[201]
Mingyuan Zhou, Hongxia Yang, Guillermo Sapiro, David B. Dunson, and Lawrence Carin. 2011. Dependent hierarchical beta process for image interpolation and denoising. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11). 883--891.
[202]
Jun Zhu, Ning Chen, and Eric P. Xing. 2011. Infinite latent SVM for classification and multi-task learning. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS’11). 1620--1628.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 52, Issue 1
January 2020
758 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3309872
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 January 2019
Accepted: 01 October 2018
Revised: 01 October 2018
Received: 01 May 2018
Published in CSUR Volume 52, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian-learning
  2. Data science
  3. machine-learning

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

  • Australian Research Council (ARC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)22
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media