Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (2010)
Reinforcement learning (RL) algorithms have long beenpromising methods for enabling an autonomous robot to improve itsbehavior on sequential decision-making tasks. The obvious enticement is that the robot should be able to improve its own behavior without theneed for detailed step-by-step programming. However, for RL to reach itsfull potential, the algorithms must be sample efficient: they must learncompetent behavior from very few real-world trials. From thisperspective, model-based methods, which use experiential data moreefficiently than model-free approaches, are appealing. But they oftenrequire exhaustive exploration to learn an accurate model of the domain. In this paper, we present an algorithm, Reinforcement Learning withDecision Trees (RL-DT), that uses decision trees to learn the model by generalizing the relative effect of actions across states. The agentexplores the environment until it believes it has a reasonable policy.The combination of the learning approach with the targeted explorationpolicy enables fast learning of the model. We compare RL-DT againststandard model-free and model-based learning methods, and demonstrate its effectiveness on an Aldebaran Nao humanoid robot scoring goals in apenalty kick scenario.
In International Conference on Robotics and Automation, 2010.

Todd Hester todd [at] cs utexas edu
Michael Quinlan mquinlan [at] cs utexas edu
Peter Stone pstone [at] cs utexas edu