Standard reinforcement learning (TD or Q learning) is based on forward exploration: later estimates are used to update earlier ones. In Dual Reinforcement Learning, backward exploration is also utilized: earlier estimates are used to update later estimates. The quality of estimates can be further improved by keeping track of how recently they were updated. In this project, these ideas are applied to the Q-routing algorithm for adaptive packet routing in communication networks, improving the speed of learning and the quality of the final routing policy.
This project is an exploration of non-symbolic learning as applied to a robot in an environment. By modifying a Braitenburg architecture with Hebbian learning techniques whcih take advantage of the stimuli and constraints of the world, a system is devised that is exceedingly simple, somewhat robust, and very adaptable.
A novel reinfrocement learning method was developed where two communicating systems could learn to predistort their signals to compensate for distortion in the channel. The two predistorters co-adapt using the output of the other predistorter to determine their own reinforcement signal. This approach makes it possible to adapt to changes in the channel characteristics on-line, as opposed to off-line learning of the current systems.