Reinforcement Learning Research

Reinforcement Learning tasks are learning problems where the desired behavior is not known; only sparse feedback on how well the agent is doing is provided. Reinforcement Learning techniques include Neuroevolution on one hand, and value-function and policy iteration methods on the other. Our research using the latter approach includes real-world applications of packet routing and satellite communication, as described below. For more details, see publications on Reinforcement Learning.

Adaptive Packet Routing: The Confidence-Based Dual Reinforcement Q-Learning Algorithm
(Shailesh Kumar, since 1998)
Standard reinforcement learning (TD or Q learning) is based on forward exploration: later estimates are used to update earlier ones. In Dual Reinforcement Learning, backward exploration is also utilized: earlier estimates are used to update later estimates. The quality of estimates can be further improved by keeping track of how recently they were updated. In this project, these ideas are applied to the Q-routing algorithm for adaptive packet routing in communication networks, improving the speed of learning and the quality of the final routing policy.
- Paper
Realtime Continuous Adaptive Behavior: The Rodney System
(Harold Chaput, 1997)
This project is an exploration of non-symbolic learning as applied to a robot in an environment. By modifying a Braitenburg architecture with Hebbian learning techniques whcih take advantage of the stimuli and constraints of the world, a system is devised that is exceedingly simple, somewhat robust, and very adaptable.
- HTML paper
- Demo (java applet)
On-Line Adaptation of a Signal Predistorter through Dual Reinforcement Learning
(Patrick Goetz, Shailesh Kumar, 1995;1996)
A novel reinfrocement learning method was developed where two communicating systems could learn to predistort their signals to compensate for distortion in the channel. The two predistorters co-adapt using the output of the other predistorter to determine their own reinforcement signal. This approach makes it possible to adapt to changes in the channel characteristics on-line, as opposed to off-line learning of the current systems.
- Paper

Back to Research Projects
Back to UTCS Neural Networks home page

risto@cs.utexas.edu

Last update: 1.4 2001/11/16 05:58:51 risto