State Abstraction Synthesis for Discrete Models of Continuous Domains (2018)
Reinforcement Learning (RL) is a paradigm for enabling autonomous learning wherein rewards are used to influence an agent’s action choices in various states. As the number of states and actions available to an agent increases, so it becomes increasingly difficult for the agent to quickly learn the optimal action for any given state. One approach to mitigating the detrimental effects of large state spaces is to represent collections of states together as encompassing “abstract states". State abstraction itself leads to a host of new challenges for an agent. One such challenge is that of automatically identifying new abstractions that balance generality and specificity; the agent must identify both the similarities and the differences between states that are relevant to its goals, while ignoring unnecessary details that would otherwise hinder the agent’s progress. We call this problem of identifying useful abstract states the Abstraction Synthesis Problem (ASP). State abstractions can provide a significant benefit to modelbased agents by simplifying their models. T-UCT, a hierarchical model-learning algorithm for discrete, factored domains, is one such method that leverages state abstractions to quickly learn and control an agent’s environment. Such abstractions play a pivotal role in the success of T-UCT; however, T-UCT’s solution to ASP requires a fully discrete state space. In this work we develop and compare enhancements to T-UCT that relax its assumption of discreteness. We focus on solving ASP in domains with multidimensional, continuous state factors, using only the T-UCT agent’s limited experience histories and minimal knowledge of the domain’s structure. Finally, we present a new abstraction synthesis algorithm, RCAST, and compare this algorithm to existing approaches in the literature. We provide the algorithmic details of RCAST and its subroutines, and we show that RCAST outperforms earlier approaches to ASP by enabling T-UCT to accumulate significantly greater total reward with minimal expert configuration and processing time.
View:
PDF, PS, HTML
Citation:
In Data Efficient Reinforcement Learning Workshop at AAAI Spring Symposium, Stanford, CA, USA, March 2018.
Bibtex:

Jacob Menashe jmenashe [at] cs utexas edu
Peter Stone pstone [at] cs utexas edu