Probabilistic Autoencoder Using Fisher Information
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Generating Process
2.2. Priors and Likelihood
2.3. Approximating the Posterior
2.4. A Metric as Covariance
2.5. Kullback–Leibler Divergence
2.6. Sampling from the Inverse Metric
3. Results
3.1. Reconstruction Error
3.1.1. Loss Function Behavior Versus Reconstruction Loss
3.1.2. Model Comparison
3.1.3. Training Performance
3.2. Analyzing the Latent Space
3.2.1. Uncertainties
3.2.2. Using the Metric
3.3. Generating Samples
3.3.1. Generating Samples from a Unit Gaussian Distribution
3.3.2. Generating Samples Using Density Estimation
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AE | autoencoder |
VAE | variational autoencoder |
GAN | generative adversarial network |
VI | variational inference |
KL | Kullback–Leibler |
MGVI | metric Gaussian variational inference |
MAP | maximum a posteriori |
ELBO | evidence lower bound |
CVAE | convolutional variational autoencoder |
MSE | mean squared error |
FID | Fréchet inception distance |
Appendix A
Hyperparameter | FisherNet | VAE | CVAE |
---|---|---|---|
Number of layers in de- and encoder | 3 | 3 | 3 |
Neurons per layer | 448 | 448 | 448 |
Layer type encoder | Dense | Dense | convolutional |
Layer type decoder | Dense | Dense | transposed convolutional |
Filters for convolutional layers | - | - | |
Filters for transposed convolutional layers | - | - | |
Kernel size | - | - | 3 |
Strides | - | - | |
Activation function | ReLU | ReLU | ReLU |
Optimizer | Adam | Adam | Adam |
Learning rate | |||
Batchsize | 64 | 64 | 64 |
Appendix B
Latent Dimension | 2 | 5 | 10 | 15 | 20 |
---|---|---|---|---|---|
VAE | 4.849 | 4.395 | 4.088 | 5.075 | 5.622 |
CVAE | 6.114 | 5.752 | 4.876 | 4.516 | 5.909 |
FisherNet | 5.504 | 5.99 | 6.08 | 5.08 | 5.319 |
References
- Carleo, G.; Cirac, I.; Cranmer, K.; Daudet, L.; Schuld, M.; Tishby, N.; Vogt-Maranto, L.; Zdeborová, L. Machine learning and the physical sciences. Rev. Mod. Phys. 2019, 91, 2773. [Google Scholar] [CrossRef] [Green Version]
- Lamb, A. A Brief Introduction to Generative Models. arXiv 2021, arXiv:2103.00265. [Google Scholar]
- Knollmüller, J.; Enßlin, T.A. Encoding Prior Knowledge in the Structure of the Likelihood. arXiv 2018, arXiv:1812.04403. Available online: https://arxiv.org/abs/1812.04403 (accessed on 2 December 2021).
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. Available online: https://arxiv.org/abs/1312.6114 (accessed on 2 December 2021).
- Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In Proceedings of the 31st International Conference on Machine Learning (PMLR), Bejing, China, 22–24 June 2014; pp. 1278–1286. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Frank, P.; Leike, R.; Enßlin, T.A. Geometric Variational Inference. Entropy 2021, 23, 853. [Google Scholar] [CrossRef]
- Wainwright, M.J.; Jordan, M.I. Graphical Models, Exponential Families, and Variational Inference. Found. Trends® Mach. Learn. 2007, 1, 1–305. [Google Scholar] [CrossRef] [Green Version]
- Grover, A.; Dhar, M.; Ermon, S. Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Arora, S.; Zhang, Y. Do GANs actually learn the distribution? An empirical study. arXiv 2017, arXiv:1706.08224. [Google Scholar]
- Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Gregor, K.; Danihelka, I.; Graves, A.; Rezende, D.; Wierstra, D. DRAW: A Recurrent Neural Network For Image Generation. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 1462–1471. [Google Scholar]
- Salimans, T.; Kingma, D.; Welling, M. Markov Chain Monte Carlo and Variational Inference: Bridging the Gap. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Bach, F., Blei, D., Eds.; PMLR: Lille, France, 2015; Volume 37, pp. 1218–1226. [Google Scholar]
- Ranganath, R.; Tran, D.; Blei, D. Hierarchical Variational Models. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA, 2016; Volume 48, pp. 324–333. [Google Scholar]
- Maaløe, L.; Sønderby, C.K.; Sønderby, S.K.; Winther, O. Auxiliary Deep Generative Models. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA, 2016; Volume 48, pp. 1445–1453. [Google Scholar]
- Rezende, D.; Mohamed, S. Variational Inference with Normalizing Flows. Int. Conf. Mach. Learn. 2015, 37, 1530–1538. [Google Scholar]
- Kingma, D.P.; Salimans, T.; Jozefowicz, R.; Chen, X.; Sutskever, I.; Welling, M. Improved Variational Inference with Inverse Autoregressive Flow. Adv. Neural Inf. Process. 2016, 29, 4743–4751. [Google Scholar]
- Germain, M.; Gregor, K.; Murray, I.; Larochelle, H. MADE: Masked Autoencoder for Distribution Estimation. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Bach, F., Blei, D., Eds.; PMLR: Lille, France, 2015; Volume 37, pp. 881–889. [Google Scholar]
- Van Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel Recurrent Neural Networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA, 2016; Volume 48, pp. 1747–1756. [Google Scholar]
- Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density Estimation Using Real NVP. arXiv 2016, arXiv:1605.08803. Available online: https://arxiv.org/abs/1605.08803 (accessed on 2 December 2021).
- Tolstikhin, I.; Bousquet, O.; Gelly, S.; Schölkopf, B. (Eds.) Wasserstein Auto-Encoders. arXiv 2017, arXiv:1711.01558. Available online: https://arxiv.org/abs/1711.01558 (accessed on 2 December 2021).
- Elkhalil, K.; Hasan, A.; Ding, J.; Farsiu, S.; Tarokh, V. Fisher Auto-Encoders. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (PMLR), Virtual, 13–15 April 2021; pp. 352–360. [Google Scholar]
- Ding, J.; Calderbank, R.; Tarokh, V. Gradient Information for Representation and Modeling. Adv. Neural Inf. Process. Syst. 2019, 32, 2396–2405. [Google Scholar]
- Dumoulin, V.; Belghazi, I.; Poole, B.; Mastropietro, O.; Lamb, A.; Arjovsky, M.; Courville, A. Adversarially Learned Inference. arXiv 2016, arXiv:1606.00704. Available online: https://arxiv.org/abs/1606.00704 (accessed on 2 December 2021).
- Rosca, M.; Lakshminarayanan, B.; Mohamed, S. Distribution Matching in Variational Inference. arXiv 2018, arXiv:1802.06847. Available online: https://arxiv.org/abs/1802.06847 (accessed on 2 December 2021).
- Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Knollmüller, J.; Enßlin, T.A. Metric Gaussian Variational Inference. arXiv 2019, arXiv:1901.11033. Available online: https://arxiv.org/abs/1901.11033 (accessed on 2 December 2021).
- Milosevic, S.; Frank, P.; Leike, R.H.; Müller, A.; Enßlin, T.A. Bayesian decomposition of the Galactic multi-frequency sky using probabilistic autoencoders. Astron. Astrophys. 2021, 650, A100. [Google Scholar] [CrossRef]
- Devroye, L. The Analysis of Some Algorithms for Generating Random Variates with a Given Hazard Rate. Nav. Res. Logist. Q. 1986, 33, 281–292. [Google Scholar] [CrossRef]
- Titsias, M.; Lázaro-Gredilla, M. Doubly Stochastic Variational Bayes for non-Conjugate Inference. In Proceedings of the 31st International Conference on Machine Learning, Beijing China, 21–26 June 2014; Xing, E.P., Jebara, T., Eds.; PMLR: Bejing, China, 2014; Volume 32, pp. 1971–1979. [Google Scholar]
- Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946; Volume 9. [Google Scholar]
- Rao, C.R. Information and the Accuracy Attainable in the Estimation of Statistical Parameters. In Breakthroughs in Statistics: Foundations and Basic Theory; Kotz, S., Johnson, N.L., Eds.; Springer: New York, NY, USA, 1992; pp. 235–247. [Google Scholar] [CrossRef]
- Shewchuk, J.R. An Introduction to the Conjugate Gradient Method without the Agonizing Pain. Available online: https://web.cs.ucdavis.edu/~bai/ECS231/References/shewchuk94.pdf (accessed on 2 December 2021).
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. Available online: https://arxiv.org/abs/1708.07747 (accessed on 2 December 2021).
- Bishop, C.M. Pattern Recognition and Machine Learning; Information Science and Statistics; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
- Tschannen, M.; Bachem, O.; Lucic, M. Recent Advances in Autoencoder-Based Representation Learning. arXiv 2018, arXiv:1812.05069. Available online: https://arxiv.org/abs/1812.05069 (accessed on 2 December 2021).
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Klambauer, G.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 30 June 2016. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
- Guardiani, M.; Frank, P.; Kostić, A.; Edenhofer, G.; Roth, J.; Uhlmann, B.; Enßlin, T. Non-Parametric Bayesian Causal Modeling of the SARS-CoV-2 Viral Load Distribution vs. Patient’s Age. arXiv 2021, arXiv:2105.13483. Available online: https://arxiv.org/abs/2105.13483 (accessed on 2 December 2021).
Latent Dimension | 2 | 5 | 10 | 15 | 20 | |
---|---|---|---|---|---|---|
116 | 80 | 68 | 61 | 57 | ||
FisherNet | 116 | 110 | 124 | 135 | 148 | |
116 | 85 | |||||
125 | 97 | 98 | 97 | 97 | ||
VAE | 125 | 99 | 100 | 102 | 100 | |
124 | 97 | |||||
151 | 95 | 86 | 87 | 90 | ||
CVAE | 152 | 95 | 89 | 92 | 100 | |
151 | 95 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zacherl, J.; Frank, P.; Enßlin, T.A. Probabilistic Autoencoder Using Fisher Information. Entropy 2021, 23, 1640. https://doi.org/10.3390/e23121640
Zacherl J, Frank P, Enßlin TA. Probabilistic Autoencoder Using Fisher Information. Entropy. 2021; 23(12):1640. https://doi.org/10.3390/e23121640
Chicago/Turabian StyleZacherl, Johannes, Philipp Frank, and Torsten A. Enßlin. 2021. "Probabilistic Autoencoder Using Fisher Information" Entropy 23, no. 12: 1640. https://doi.org/10.3390/e23121640
APA StyleZacherl, J., Frank, P., & Enßlin, T. A. (2021). Probabilistic Autoencoder Using Fisher Information. Entropy, 23(12), 1640. https://doi.org/10.3390/e23121640