Developing Bug-Free Machine Learning Systems With Formal Mathematics

Daniel Selsam, Percy Liang, David L. Dill
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3047-3056, 2017.

Abstract

Noisy data, non-convex objectives, model misspecification, and numerical instability can all cause undesired behaviors in machine learning systems. As a result, detecting actual implementation errors can be extremely difficult. We demonstrate a methodology in which developers use an interactive proof assistant to both implement their system and to state a formal theorem defining what it means for their system to be correct. The process of proving this theorem interactively in the proof assistant exposes all implementation errors since any error in the program would cause the proof to fail. As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients. We train a variational autoencoder using Certigrad and find the performance comparable to training the same model in TensorFlow.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-selsam17a, title = {Developing Bug-Free Machine Learning Systems With Formal Mathematics}, author = {Daniel Selsam and Percy Liang and David L. Dill}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {3047--3056}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/selsam17a/selsam17a.pdf}, url = {https://proceedings.mlr.press/v70/selsam17a.html}, abstract = {Noisy data, non-convex objectives, model misspecification, and numerical instability can all cause undesired behaviors in machine learning systems. As a result, detecting actual implementation errors can be extremely difficult. We demonstrate a methodology in which developers use an interactive proof assistant to both implement their system and to state a formal theorem defining what it means for their system to be correct. The process of proving this theorem interactively in the proof assistant exposes all implementation errors since any error in the program would cause the proof to fail. As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients. We train a variational autoencoder using Certigrad and find the performance comparable to training the same model in TensorFlow.} }
Endnote
%0 Conference Paper %T Developing Bug-Free Machine Learning Systems With Formal Mathematics %A Daniel Selsam %A Percy Liang %A David L. Dill %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-selsam17a %I PMLR %P 3047--3056 %U https://proceedings.mlr.press/v70/selsam17a.html %V 70 %X Noisy data, non-convex objectives, model misspecification, and numerical instability can all cause undesired behaviors in machine learning systems. As a result, detecting actual implementation errors can be extremely difficult. We demonstrate a methodology in which developers use an interactive proof assistant to both implement their system and to state a formal theorem defining what it means for their system to be correct. The process of proving this theorem interactively in the proof assistant exposes all implementation errors since any error in the program would cause the proof to fail. As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients. We train a variational autoencoder using Certigrad and find the performance comparable to training the same model in TensorFlow.
APA
Selsam, D., Liang, P. & Dill, D.L.. (2017). Developing Bug-Free Machine Learning Systems With Formal Mathematics. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3047-3056 Available from https://proceedings.mlr.press/v70/selsam17a.html.

Related Material