TEXPLORE: Real-Time Sample Efficient Reinforcement Learning (2012)
Author: Todd Hester

TEXPLORE is a Reinforcement Learning (RL) algorithm, or a method for agents to learn to perform sequential decision making tasks through interaction with their environment. This research is focused on applying RL to more real-world problems, particularly learning on robots. Enabling robots to learn will make them generally more useful, as they will not require pre-programming for every task and environment.

In RL, an agent is in some state in the world (e.g. a particular chess board configuration or a location in a city) and has some set of actions it can take (e.g. chess moves, turns at an intersection). Upon taking an action, it reaches a new state, and receives a scalar reward (e.g. +1 for winning the chess game, -1 for losing, and 0 otherwise; or minus the time each road segment took). The goal of the agent is to learn which action to take in each state to maximize its reward over time.

There are a number of issues with applying RL to real-world problems such as robots. First, learning must happen with a limited number of actions. Methods that take thousands or millions of actions to learn are not feasible for a robot, as the robot is likely to break, wear out, run out of battery power, or overheat before that many actions can be taken. Second, learning must take place in real-time. We would like the RL agent to be in continual control of the robot, not controlling it for short periods of time followed by long pauses for it to compute what action it should take next. Finally, there are issues with handling the continuous state space of robots, and dealing with the delays many mechanical actuators have.

To address these issues, we have developed an algorithm called TEXPLORE. It is a model-based RL algorithm, which means it learns a model of the state transition and reward dynamics of the domain and then uses its model to plan a policy, enabling it to learn in fewer actions than many model-free approaches. It also utilizes a real-time architecture which performs the model learning and planning in parallel threads, so the agent can act in real-time. I've released a ROS package with the TEXPLORE source code that can be easily applied to any robots running ROS.

A complete description of the TEXPLORE algorithm, with videos detailing each aspect of it, is available on my webpage.

Video: Learning to Score Penalty Kicks via Reinforcement Learning

The accompanying video for our ICRA 2010 paper, where our learning algorithm controls the robot, learning to score penalty kicks.

Todd Hester todd [at] cs utexas edu
Peter Stone pstone [at] cs utexas edu
Michael Quinlan mquinlan [at] cs utexas edu
{RTMBA}: A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control Todd Hester and Michael Quinlan and Peter Stone In {IEEE} International Conference on Robotics and Automation (ICRA), May 2012. 2012

Intrinsically Motivated Model Learning for a Developing Curious Agent Todd Hester and Peter Stone In Eleventh International Conference on Autonomous Agents and Multiagent Systems - Adaptive Learn... 2012

A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control Todd Hester and Michael Quinlan and Peter Stone %RefShort% 2011

Real Time Targeted Exploration in Large Domains Todd Hester and Peter Stone In 2010, editors, Proceedings of the Ninth International Conference on Development and Learning (... 2010

Generalized Model Learning for Reinforcement Learning on a Humanoid Robot Todd Hester and Michael Quinlan and Peter Stone In International Conference on Robotics and Automation, 2010. 2010

Generalized Model Learning for Reinforcement Learning in Factored Domains Todd Hester and Peter Stone In The Eighth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), M... 2009

An Empirical Comparison of Abstraction in Models of Markov Decision Processes Todd Hester and Peter Stone In Proceedings of the ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning, June 2... 2009

{TEXPLORE}: Real-Time Sample-Efficient Reinforcement Learning for Robots Todd Hester and Peter Stone Machine Learning, 2012. 2012

Intrinsically Motivated Model Learning for a Developing Curious Agent Todd Hester and Peter Stone In The Eleventh International Conference on Development and Learning (ICDL), Nov 2012. 2012

TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Todd Hester PhD Thesis, The University of Texas at Austin, Austin, Texas, USA, December 2012. Code available at:... 2012