Many routine tasks in the real world can be seen as sequential
decision tasks. For instance, navigating a robot through a complex
environment, driving a car in congested traffic, and routing packets
in a computer network requires making a sequence of decisions that
together minimize time and resources used. It would be desirable to
automate these tasks, yet it is difficult because the optimal
decisions are generally not known. Approximating them by finite-state
machines or learning them based on reinforcement leads to reactive
behaviors that perform well in short term, but do not amount to
intelligent high-level behavior in the long term. The goal of this
project is to develop the technology that makes learning such
strategic high-level behavior possible.
The main technical challenge is to devise a method that extends
sequential decision learning from reactive to strategic behaviors.
Such a method needs to be able to (1) retain information from past
states, (2) learn multimodal behavior, (3) choose between the
different behaviors based on crucial detail, and (4) implement a
sequential high-level strategy based on those behaviors. The
neuroevolution methods developed in prior work solve the first problem
by evolving (through genetic algorithms) recurrent neural networks to
represent the behavior. To solve the remaining problems, these
methods will be extended with multi-objective optimization, local
nodes with cascaded structure, and with evolution of modules and their
combinations. Preliminary results indicate that this approach is
indeed feasible. In this project, it will be first characterized
fully in supervised learning tasks as well as in synthetic sequential
decision tasks. It will then be scaled up to a robotic soccer
simulation in OpenNERO, and evaluated in two ways: In an objective
comparison with other hand-coded and learned soccer teams, and through
a subjective analysis (by human evaluators) of the learned strategies.
The end result will be a systematic approach to learning strategic
high-level behavior in sequential decision tasks.
In the long term, the technology should make it possible to build
robust sequential decision systems for real-world tasks. It should
lead to safer and more efficient vehicle, traffic, and robot control,
improved process and manufacturing optimization, and more efficient
computer and communication systems. It should also make the next
generation of video games possible, with characters that exhibit
realistic, strategic behaviors: Such technology should lead to more
effective educational and training games in the future.
This research is supported by the National Science Foundation under grant IIS-0915038.