Solving Non-Markov Control Tasks
Active from 1996 - 2006

Most sequential decisions tasks in the real world, such as manufacturing and robot control short-term memory. Such controllers are difficult to design by traditional engineering or even conventional reinforcement learning methods because the environments are often non-linear, high-dimensional, stochastic, and non-stationary. Evolutionary methods can potentially solve these difficult problems but like these other approachs require that solutions be evaluated in simulation and then transferred to the real world. In order to successfully apply evolution to these task two components are required: (1) a learning method powerful enough to solve problems of this difficulty in simulation and (2) an methodology that facilitates trasnfer to the real world.

I have developed a neuroevolution algorithm, Enforced SubPopulations (ESP), that extends SANE by allowing neurons to evolve recurrent connections and, therefore, use information about past experience (i.e. memory) to make decisions. Because of sensory limitations, it is not always possible for the control system to identify the state directly; instead, the system must make use of its perceptual history to disambiguate the state. Conventional learning methods such as Q-learning do not work well in such non-Markov environments. However, neuro-evolution has recently shown to be a very promising alternative. In this work I explore an approach for solving continuous, non-Markov control tasks that is composed of two separate parts: (1) A neuro-evolution approach, Enforced SubPopulations (ESP), that extends SANE by allowing neurons to evolve recurrent connections and, therefore, use information about past experience (i.e. memory) to make decisions. (2) An Incremental Evolution approach that allows evolutionary methods to solve hard tasks by evolving on a sequence of increasingly difficult tasks. The method has been tested on several Markov and non-Markov versions of the pole balancing problem, as well as on evolving general behavior in the prey capture task. The results show that ESP with Incremental Evolution is more efficient than other methods and can solve harder versions of the tasks.

Because it is impractical to evaluate entire populations of controllers in the real world, Evolutionary approaches are just as dependent on simulation as other reinforcement learning methods. Controllers must first be learned off-line in a simulator or {em simulation environment} and then be transferred to the actual {em target environment} where it is ultimately meant to operate. requires solutions to be discovered in simulation and then transferred to the real world. To ensure that transfer is possible, evolved controllers need to be robust enough to cope with discrepancies between these two settings. So far, transfer of evolved mobile robot controllers has been shown to be possible, but there is very little research on transfer in other classes of tasks, such as the control of unstable systems. The second goal of this paper is to analyze what factors influence transfer and show that transfer is possible even in high-precision tasks in unstable environments, such as the most difficult pole balancing task.

However, no matter how rigorously they are developed, simulators cannot faithfully model all aspects of a target environment. Whenever the target environment is abstracted in some way to simplify evaluation, spurious features are introduced into the simulation. If a controller relies on these features to accomplish the task, it will fail to transfer to the real world where the features are not available~cite{mataric:ras96}. Since some abstraction is necessary to make simulators tractable, such a "reality gap'' can prevent controllers from performing in the physical world as they do in simulation.

Faustino Gomez Postdoctoral Alumni tino [at] idsia ch
Neuroevolution: Automating Creativity in AI Model Design Sebastian Risi, David Ha, Yujin Tang, Risto Miikkulainen To Appear In , Cambridge, MA, 2025. MIT Press. 2025

Efficient Non-Linear Control through Neuroevolution Faustino Gomez, Juergen Schmidhuber, and Risto Miikkulainen In Proceedings of the European Conference on Machine Learning, 654-662, Berlin, 2006. Springe... 2006

ESP C++ The ESP package contains the source code for the Enforced Sup-Populations system written in C++. ESP is an extension t... 2000