In this work we study variational methods for Bayesian optimal experimental design (BOED). Experimentation is a cornerstone of science and is central to any major engineering effort. Often experiments require the use of substantial resources, from expensive equipment to limited researcher time; in addition, experiments can be dangerous or may be required to be completed in a given period of time. For these reasons, we prefer to conduct our experiments as efficiently as possible, acquiring as much information as we can given the resources available to us. Optimal experimental design (OED) is a sub-field of statistics focused on developing methods for accomplishing this goal. The OED problem is formulated by defining a utility function over designs and optimizing this function over the set of all feasible designs. We focus on the \emph{Expected Information Gain} (EIG), a widely used utility function with sound theoretical support. However, in practice the EIG is intractable to compute, and approximation strategies are required. We investigate the use of variational methods for this purpose and show substantial improvement over competing approximation techniques. A specific form of OED common in the field of machine learning (ML) is \emph{active learning} (AL). In the active learning framework, we would like to obtain a labeled dataset in order to train a supervised model. However, for all the reasons stated, labeling data points can be costly and again we should make efficient use of our labeling resources. We present a novel application of active learning to optimize spectroscopic follow up for large scale astronomical surveys. Finally, much of this work requires learning functions over sets which we know must satisfy certain properties (e.g., permutation invariance). We conclude the thesis by presenting a novel neural network architecture for predicting the astronomical class of individual objects in the same exposure using a neural architecture specifically designed to accommodate known inductive biases present in the data.