Helicopter

We have applied our apprenticeship learning and reinforcement learning algorithms to the problem of autonomous helicopter flight. This resulted in a robust, highly capable, controller for our helicopter. In particular, our helicopter can now perform very difficult aerobatic maneuvers, such as in-place flips (pitching backward to perform a 360 degrees rotation---imagine a backward somersault), in-place rolls, a steep ``funnel'' maneuver (flying sideways in a circle while steeply pitched forwards or backwards, so that the helicopter traces out the surface of a ``funnel''), and even tic-tocs (analogous to a metronome or an inverted pendulum, where the helicopter, with nose up and tail down, quickly pitches approximately 30 degrees back and forth) and chaos (arguably, the most challenging aerobatic maneuver). Such maneuvers are well beyond the abilities of all but the best human pilots; to our knowledge, these are also by far the most difficult maneuvers performed on any autonomous helicopter.

Helicopter Videos

airshow mp4   chaos mp4   tic-toc wmv   flips wmv   rolls wmv   nose-in funnel wmv   tail-in funnel wmv

For more information, also see the Stanford Autonomous Helicopter Project.

Related publications

An Application of Reinforcement Learning to Aerobatic Helicopter Flight,
Pieter Abbeel, Adam Coates, Morgan Quigley and Andrew Y. Ng.
In NIPS 19, 2007. (ps, pdf)

Using Inaccurate Models in Reinforcement Learning,
Pieter Abbeel, Morgan Quigley and Andrew Y. Ng.
In Proceedings of ICML, 2006. (ps, pdf, long version: ps pdf)

Modeling Vehicular Dynamics, with Application to Modeling Helicopters,
Pieter Abbeel, Varun Ganapathi and Andrew Y. Ng.
In NIPS 18, 2006. (ps, .pdf)

Exploration and Apprenticeship Learning in Reinforcement Learning,
Pieter Abbeel and Andrew Y. Ng.
In Proceedings of ICML, 2005. (ps, pdf, long version: ps, pdf)

Learning First Order Markov Models for Control,
Pieter Abbeel and Andrew Y. Ng.
In NIPS 17, 2005. (ps, pdf)

Apprenticeship Learning via Inverse Reinforcement Learning,
Pieter Abbeel and Andrew Y. Ng.
In Proceedings of ICML, 2004. (ps, pdf, supplement: ps , pdf, supplementary webpage here)

RC car and Flight Simulator

The RC car and flight simulator videos below illustrate our reinforcement learning with inaccurate models algorithm presented at ICML 2006. We have since also applied an extension of that idea to design controllers for our autonomous helicopter.

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or ``simulator'') of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively ``ground'' the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.

RC Car Videos

Learning to execute a closed-loop circular trajectory: mpg.
Learning to execute a closed-loop "figure-8" trajectory: mpg.
Learning to execute an open-loop turning trajectory: mpg.

Flight Simulator Videos

Learning to fly a "figure-8" trajectory: mpg. The biggest difference between our algorithm's controller and the pure model-based controller is the altitude accuracy. This is visually most easily seen by looking at the trace of the trajectory shown in front of the airplane. It goes up and down significantly for the model-based controller, whereas the controller from our algorithm keeps altitude (almost) perfectly. (See paper for details.)

Related Publications

Using Inaccurate Models in Reinforcement Learning,
Pieter Abbeel, Morgan Quigley and Andrew Y. Ng.
In Proceedings of ICML, 2006. (ps, pdf, long version: ps pdf)

Highway Driving Simulator

The highway driving videos below illustrate our "apprenticeship learning via inverse reinforcement learning" algorithm presented at ICML 2004.

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using ``inverse reinforcement learning'' to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's \emph{unknown} reward function.

Highway Driving Videos

The videos illustrate the effectiveness in learning five different driving styles:
1: Nice. expert demonstration learned controller expert and learned side-by-side
2: Bad. expert demonstration learned controller expert and learned side-by-side
3: Right lane nice. expert demonstration learned controller expert and learned side-by-side
4: Right lane bad. expert demonstration learned controller expert and learned side-by-side
5: Middle lane. expert demonstration learned controller expert and learned side-by-side

Related Publications

Apprenticeship Learning via Inverse Reinforcement Learning,
Pieter Abbeel and Andrew Y. Ng.
In Proceedings of ICML, 2004. (ps, pdf, supplement: ps , pdf, supplementary webpage here)

Quadruped

Legged robots, unlike wheeled robots, have the potential to access nearly all of the earth's land mass, enabling robotic applications in areas where they are currently infeasible. However, the current control software for legged robots is quite limited, and does not let them realize this potential.

We proposed a method for hierarchical apprenticeship learning: our algorithm accepts advice for the quadruped locomotion task at different hierarchical levels of the control task. Our algorithm then uses this advice to find a controller that allows the quadruped to successfully traverse highly non-trivial, previously unseen terrains.

Quadruped Videos

Planning Before/After Learning #1 (9/2007). (mp4, wmv)
Planning Before/After Learning #2 (9/2007). (mp4, wmv)

For more information, also see the Stanford Learning Locomotion Project.

Related publications

Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion,
J. Zico Kolter, Pieter Abbeel and Andrew Y. Ng.
In NIPS 20, 2008. (forthcoming: ps, pdf)