I am interested in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction.
Previously, I completed my Ph.D. in computer science at UC Berkeley and my B.S. in electrical engineering and computer science at MIT.
Prospective students and post-docs, please read this before contacting me.
Thank you for your interest in joining my lab! I am taking on new MS and PhD students each year. However, I ask that you do not contact me directly with regard to MS or PhD admissions until after you are admitted, as I will not be able to reply to individual emails.
If you are interested in a post-doc position, please read this form.
If you are a current or admitted Stanford undergraduate or MS student interested in research positions, please read this form.
If you are not a Stanford student and insteresed in research positions, please read this form.
At NIPS 2017, we showcased our research on meta-imitation learning and visual foresight in a live robot demo! For more information and a video, see this page.
In summer 2017, I co-organized BAIR camp, a 2-day summer camp on human-centered AI for high-school students from low income backgrounds. We are organizing a second camp in August 2018.
My colleagues and I have released the robotic grasping and pushing data used in Levine et al. '16 (ISER) and Finn et al. '16 (NIPS): Google Brain Robotics Data.
Robots that Learn to Use Improvised Tools: how robots can figure out how to solve tasks using tools, including unconventional tools, by learning from a combination of unsupervised interaction and example demonstrations.
At ICML 2019 and CVPR 2019, I gave an invited tutorial on Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning. Slides, video, and references are linked here.
In December 2018, I gave a tutorial on model-based reinforcement learning at the CIFAR LMB program meeting (slides here).
In August 2017, I gave guest lectures on model-based reinforcement learning and inverse reinforcement learning at the Deep RL Bootcamp (slides here and here, videos here and here).
In Spring 2017, I co-taught a course on deep reinforcement learning at UC Berkeley. All lecture video and slides are available here.
Invited Talks
I gave a talk on data scalability in robot learning (video here) at the RSS Workshop on Self-Supervised Robot Learning
At L4DC 2020, I gave an invited talk on extrapolation via adaptation (video here).
I gave a talk on challenges in multi-task learning and meta-learning (slides here, video here) at the IAS Workshop on New Directions in Optimization Statistics and Machine Learning.
At NeurIPS 2019, I gave an invited talk on Meta-Learning and Memorization (slides here, video here) at the Bayesian Deep Learning Workshop
At RLDM 2019, I gave an invited talk on Reinforcement Learning for Robots (slides here).
At CVSS 2019, I gave an invited lecture on Deep Visuomotor Learning (slides here).
At RSS 2019, I gave invited talks in the workshops on Simulation to Real World Transfer (slides here), the workshop on Task-Informed Grasping (slides here), and the workshop on Women in Robotics (slides here)
At ICLR 2019, I gave invited talks at the Task-Agnostic RL Workshop (slides here, video here), the workshop on Learning from Limited Labeled Data (slides here, video here).
At NeurIPS 2018, I gave invited talks at the Continual Learning Workshop (slides here), the workshop on Learning to Model the Physical World (slides here), and the workshop on Spatiotemporal Modeling (slides here).
In September 2018, I gave a 3-minute talk at EmTech (video here)
In July 2018, I gave a talk at Google DeepMind with Sergey Levine on meta-learning frontiers. (slides here)
We propose a technique that uses time-reversal to learn goals and provide a high level plan to reach them. In particular, our approach explores outward from a set of goal states and learns to predict these trajectories in reverse, which provides a high-level plan towards goals.
We identify and formally describe a peculiar, yet widespread problem with meta-learning algorithms that occurs from small and seemingly benign changes to the training set-up, and identify a meta-regularization solution for solving the problem for multiple classes of meta-learning methods.
One-shot imitation learning enables robots to learn from a single demonstration, but doesn't allow them to improve through trial-and-error. Few-shot reinforcement learning allows for fast trial-and-error learning, but needs many trials to learn a new task with sparse rewards. We propose a simple and scalable approach to enable robots to meta-learn behavior from both demos and rewards, where a demonstration is used to indicate the task and trial-and-error is used to refine the skill.
We study how we can learn long-horizon vision-based tasks in self-supervised settings. Our approach, hierarchical visual foresight, can optimize for a sequence of subgoals that will make the task easier.
The standard paradigm in robot learning is to set-up experiments in a single lab environment and train a robot from scratch from data collected in that setting. In contrast, essentially all machine learning fields accumulate and share large datasets across institutions, which enables training of models that generalize much more broadly. We aim to take a step in this direction by collecting a large dataset from 7 robot platforms across multiple institutions, which we call
RoboNet. Critically, we find that pre-training on RoboNet enables us to generalize to entirely new robot platforms with less data than training from scratch.
While meta-RL is a promising approach for enabling robots to quickly learn new tasks based on previous experience, existing methods have tested on narrow distributions of tasks, hindering generalization. We develop a benchmark of 50 qualitatively distinct robotic manipulation tasks, with the goal of enabling future research on meta-RL that studies generalization to entirely new tasks.
Scaling meta-learning to long inner optimization procedures is difficult. We introduce iMAML, which meta-learns without differentiating through the inner optimization path using implicit differentiation. This allows you to use MAML with any inner loop optimizer. We also provide a theoretical analysis of the memory and computational requirements of a variety of meta-learning algorithms.
We propose to use language as an abstraction for hierarchical reinforcement learning as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems. We also introduce a new open-source environment inspired by the CLEVR dataset for studying language and interaction.
We propose to learn a fast reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks. Our approach is significantly more efficient and stable than prior methods, while scaling gracefully to vision-based and sparse reward tasks.
We show how we can learn a prior over reward functions from heterogeneous demonstration data using deep latent variable models. The proposed approach can learn reward functions for new tasks from a single demonstration, and use these rewards to learn a policy for the demonstrated task.
We aim to learn multi-stage vision-based tasks on a real robot from a single video of a human performing the task. We propose a method that learns both how to learn primitive behaviors from video demonstrations and how to dynamically compose these behaviors to perform multi-stage tasks by "watching" a human demonstrator.
Motivated by the challenge of learning behavior efficiently and the manual effort that often goes into designing reward functions, we extend prior work on variational inverse control with events (VICE) to the off-policy RL setting and show how robots use active queries to learn the reward function more efficiently and robustly.
We study how robots can learn to use tools. With a combination of autonomous robot interaction (to learn about cause and effect) and teleoperated demonstrations (to learn about how to use tools), we show that robots can figure out how to solve tasks using novel tools and even improvise when conventional tools aren't available.
How can robots learn without rewards? We train for a metric such that optimizing that metric leads to actions that reach the goal, which we can train for with random interaction data. By doing so, robots can learn a variety of image-based tasks without any human supervision.
We introduce PEARL, a method that leverages off-policy learning and a probabilistic belief over the task to make meta-reinforcement learning 20-100X more sample efficient.
Learning the objective underlying example behavior is a challenging, under-defined problem, particularly from only a few demonstrations. However, there is structure among the type of behaviors that we might want agents to learn. We learn this structure from demonstrations across many tasks, acquiring a prior over intentions, and use this learned prior to infer reward functions for new tasks from only a few demonstrations.
We identify key shortcomings of existing meta-reinforcement learning algorithms in the setting of adapting to new dynamics, and develop a new method that can effectively adapt to new dynamics in a model-free way without reward information, by meta-learning an advantage function.
We propose a method that learns how to adapt online to new situations and perturbations, through meta reinforcement learning. Unlike prior meta-RL methods,
our approach is model-based, making it sample-efficient during meta-training and thus practical for real world problems.
We propose CACTUs, an unsupervised learning algorithm that learns to learn tasks constructed from unlabeled data. CACTUs leads to significantly more effective downstream learning and enables few-shot learning without requiring labeled meta-learning datasets.
We develop an object-centric model of visual interactions, illustrate the model's internal representations of objects and physics, and use it to accomplish a variety of long-horizon block-stacking tasks on a robot.
We show that pre-training model parameters with meta-learning (using MAML) can enable effective online learning with neural networks, which we apply to model-based RL problems with non-stationary dynamics.
While meta-learning enables fast learning of new tasks, it requires a human to specify a distribution over tasks for meta-training. In effect, meta-learning offloads the design burden from algorithm design to task design. We propose to automate the design of tasks for meta-learning, describing a family of unsupervised meta-reinforcement learning algorithms that are truly automated.
We provide a unified overview of our work on visual foresight, and show new experiments that show how a single video prediction model can be used to solve many different vision-based tasks, including deformable object manipulation tasks involving towels, shorts, and shirts.
We combine latent variable models with adversarial training to build a video prediction model that produces predictions that look more realistic to human raters and better cover the range of possible futures.
Few-shot learning problems can be ambiguous. We propose a modification of the MAML algorithm that can handle ambiguity by sampling different multiple classifiers. Our approach uses a Bayesian formulation of meta-learning, building upon prior work on hierarchical Bayesian models and variational inference.
We develop a clear and formal definition of the meta-learning problem, its terminology, and desirable properties of meta-learning algorithms. Building upon these foundations, we present a class of model-agnostic meta-learning methods that embed gradient-based optimization into the learner. Finally, we show how these methods can be extended for applications in motor control by combining elements of meta-learning with techniques for deep model-based reinforcement learning, imitation learning, and inverse reinforcement learning.
Specifying a reward or objective in the real world is hard. We propose a method that enables a robot to learn an objective from a few images of success by leveraging a dataset of positive and negative examples of previous tasks. We show how the objectives learned with our method can be used for both planning in the real world and reinforcement learning in simulation.
Planning with video prediction models trained on self-supervised data allows robots to learn diverse manipulation skills. However, to recover from disturbances and inaccurate predictions, we need to track pixels continuously to evaluate the planning objective at each timestep. We propose a self-supervised image-to-image registration model that enables robust behavior.
We propose to embed differentiable planning within a goal-directed policy, integrating planning and representation learning. Our approach optimizes for
representations that lead to effective goal-based planning for visual tasks. Our results show that the representation not only allow for effective goal-based
planning through imitation, but also transfers to more complex robot morphologies and action spaces.
We develop a domain-adaptive meta-learning method that allows for one-shot learning under domain shift. We show that our method can enable a robot to learn to maneuver a new object after seeing just
one video of a human performing the task with that object.
We show that model-agnostic meta-learning (MAML), which embeds gradient descent into the meta-learning algorithm, can be as expressive as black-box meta-learners: both can approximate any learning algorithm.
Furthermore, we empirically show that MAML consistently finds learning strategies that generalize to new tasks better than recurrent meta-learners.
We reformulate the model-agnostic meta-learning algorithm (MAML) as a method for probabilistic inference in a hierarchical Bayesian model.
Unlike prior methods for meta-learning via hierarchical Bayes, MAML is naturally applicable to large function approximators, like neural networks.
Our interpretation sheds light on the meta-learning procedure and allows us to derive an improved version of the MAML algorithm.
We present a stochastic video prediction method, SV2P, that builds upon the conditional variational autoencoder to make stochastic predictions of future video.
We find that pretraining is crucial for enabling stochasticity. Our experiments demonstrate stochastic multi-frame predictions on three real world video datasets.
We propose a simulated benchmark for robotic grasping that emphasizes off-policy learning and generalization to unseen objects.
Our results indicate that several simple methods provide a surprisingly strong
competitor to popular deep RL algorithms such as double Q-learning, and our analysis sheds light on the relative tradeoffs between the methods.
Using demonstration data from a variety of tasks, our method enables a real robot to learn a new related skill, trained end-to-end, using a single visual demonstration of the skill. Our approach also allows for the provided demonstration to be a raw video, without access to the joint trajectory or controls applied to the robot arm.
We present three simple improvements to our prior work on self-supervised visual foresight that lead to substantially better visual planning capabilities. Our
method can perform tasks that require longer-term planning and involve multiple objects.
We propose a model-agnostic algorithm for meta-learning, where a model's parameters
are trained such that a small number of gradient updates with a small amount of training data from a new task
will produce good generalization performance on that task. Our method learns a classifier that can recognize
images of new characters using only a few examples, and a policy that can rapidly adapt
its behavior in simulated locomotion tasks.
We formalize the problem of semi-supervised reinforcement learning (SSRL), motivated by real-world scenarios where reward information
is only available in a limited set of scenarios such as when a human supervisor is present, or in a controlled laboratory setting.
We develop a simple algorithm for SSRL based on inverse reinforcement learning and show that it can improve performance by using
'unlabeled' experience.
We combine an action-conditioned predictive model of images, "visual foresight," with model-predictive control for planning how
to push objects. The method is entirely self-supervised, requiring minimal human involvement.
We present a new guided policy search algorithm that allows the method to be used in domains where the initial conditions are stochastic, which makes the method
more applicable to general reinforcement learning problems and improves generalization performance in our robotic manipulation experiments.
We show that a sample-based algorithm for maximum entropy inverse reinforcement learning (MaxEnt IRL) corresponds to a generative adversarial network (GAN) with a particular choice of discriminator.
Since MaxEnt IRL is simply an energy-based model (EBM) for behavior, we further show that GANs optimize EBMs with the corresponding discriminator,
pointing to a simple and scalable EBM training procedure using GANs.
We propose a technique for learning an active learning strategy by combining one-shot learning and reinforcement learning, and allowing the model
to decide, during classification, which examples are worth labeling. Our experiments demonstrate that our model can trade-off
accuracy and label requests based on the reward function provided.
Our video prediction method predicts a transformation to apply to the previous image, rather than pixels values directly, leading to significantly improved multi-frame video prediction. We also introduce
a dataset of 50,000 robotic pushing sequences, consisting of over 1 million frames.
Collecting real-world robotic experience for learning an initial visual representation can be expensive. Instead, we show that it is possible to learn
a suitably good initial representation using data collected largely in simulation.
We propose an method for Inverse Reinforcement Learning (IRL) that can handle unknown dynamics and scale to flexible, nonlinear cost functions. We evaluate our algorithm on a series of simulated tasks and real-world robotic manipulation problems, including pouring and inserting dishes into a rack.
We learn a lower dimensional visual state-space without supervision using deep spatial autoencoders, and use it to learn nonprehensile manipulation
tasks, such as pushing a lego block and scooping a bag into a bowl.
We propose a method for learning recurrent neural network policies using continuous memory states. The method learns to store information in and use the memory states
using trajectory optimization. Our method outperforms vanilla RNN and LSTM baselines.
We develop a method that integrates text-spotting with simultaneous localization and mapping (SLAM), that determines loop closures using text in the environment.
We consider the problem of selecting which demonstration to transfer to the current test scenario.
We frame the problem as an options Markov decision process (MDP) and develop an approach to learn a Q-function from expert demonstrations.
Our results show significant improvement over nearest-neighbor selection.