Title: The Promise and Challenge of Unsupervised Learning
Speaker: Sham Kakade
Abstract: Representational learning is one of the central challenges in machine learning, where the goal is to find transformations of our data which support improvements in downstream tasks of interest. The algorithmic challenge is in designing methods which learn such representations in an automated manner (which may be done through supervised learning, unsupervised learning, transfer learning, and multi-task learning). By and large, most of the recent breakthroughs in machine learning have been heavily driven by supervised learning methods, and, yet, there is still a widespread belief that unsupervised learning methods will ultimately lead to more substantial advances.
This talk will both: survey recent algorithmic advances with regards to unsupervised learning and discuss a few empirical questions and conjectures (related to deep learning methods). With regards to recent algorithmic advances, there is now a body of work suggesting that, in principle, many of our natural hidden variable models are in fact efficiently learnable with spectral algorithms (from both a computational and a statistical perspective). Unfortunately, in many practical settings of interest (from datasets of images, to natural language corpuses, to genomic and medical datasets), many phenomena occur once (or a few times) in the data, due to the heavy-tailed nature of many real-world distributions. This sparse nature of our data may cause many existing algorithms to fail (as they are not tailored to this regime); these issues are directly analogous to questions studied in sparse random graph theory and the community detection literature, where it is known how certain spectral algorithms may fail. Here, I’ll discuss some recent progress towards addressing these issues (along with the implications to certain tasks in natural language processing).
Going forward, I’ll also discuss some broader algorithmic and empirical challenges with regards to moving away from “end-to-end” training (as is common in many deep learning methods) to more modular approaches to learn intermediate representations..
Bio: Sham Kakade is a Washington Research Foundation Data Science Chair, with a joint appointment in both the Computer Science & Engineering and Statistics departments at the University of Washington. He completed his Ph.D. at the Gatsby Computational Neuroscience Unit at University College London, advised by Peter Dayan, and earned his B.S. in physics at Caltech. Before joining the University of Washington, Dr. Kakade was a principal research scientist at Microsoft Research, New England. Prior to this, Dr. Kakade was an associate professor at the Department of Statistics, Wharton, University of Pennsylvania (2010-2012) and an assistant professor at the Toyota Technological Institute at Chicago (2005-2009). Dr. Kakade completed a postdoc in the Computer and Information Science Department at the University of Pennsylvania under the supervision of Michael Kearns.
He works in the area broadly construed as data science, focusing on designing (and implementing) both statistically and computationally efficient algorithms for machine learning, statistics, and artificial intelligence. His intent is to see these tools advance the state of the art on core scientific and technological problems.
One line of his work has been in providing computationally efficient algorithms for statistical estimation, which has included the estimation of various statistical models with hidden (or latent) structure (including mixture models, topic models, hidden markov models, and models communities in social networks). More broadly, Sham has made various contributions in various areas including statistics, optimization, probability theory, machine learning, algorithmic game theory and economics, and computational neuroscience. He has had numerous roles in chairing conferences and workshops, has given numerous plenary talks, and has received various awards