Computer Science Thesis Proposal

Tuesday, November 17, 2015 - 12:00pm


1507 Newell-Simon Hall



A wide range of machine learning problems, including astronomical inference about galaxy clusters, natural image scene classification, parametric statistical inference, and predictions of public opinion, can be well-modeled as learning a function on (samples from) distributions. This thesis explores problems in learning such functions via kernel methods. The first challenge is one of computational efficiency when learning from large numbers of distributions: the computation of typical methods scales between quadratically and cubically, and so they are not amenable to large datasets. We investigate the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions. We present a new embedding for a class of information-theoretic distribution distances, and evaluate it and existing embeddings on several real-world applications. We also propose the integration of these techniques with deep learning models so as to allow the simultaneous extraction of rich representations for inputs with the use of expressive distributional classifiers. In a related problem setting, common to astrophysical observations, autonomous sensing, and electoral polling, we have the following challenge: when observing samples is expensive, but we can choose where we would like to do so, how do we pick where to observe? We propose the development of a method to do so in the distributional learning setting (with a natural application to astrophysics), as well as giving a method for a closely related problem where we search for instances of patterns by making point observations Our final challenge is that the choice of kernel is important for getting good practical performance, but how to choose a good kernel for a given problem is not obvious. We propose to adapt recent kernel learning techniques to the distributional setting, allowing the automatic selection of good kernels for the task at hand. Integration with deep networks, as previously mentioned, may also allow for learning the distributional distance itself. Throughout, we combine theoretical results with extensive empirical evaluations to increase our understanding of the methods. Thesis Committee:Jeff Schneider (Chair)Nina BalcanBarnabás PóczosArthur Gretton (University College London) Copy of Thesis Summary

For More Information, Contact:


Thesis Proposal