Dual Reinforcement Q-Routing: An On-Line Adaptive Routing Algorithm (1997)
This paper describes and evaluates the Dual Reinforcement Q-Routing algorithm (DRQ-Routing) for adaptive packet routing in communication networks. Each node in the network has a routing decision maker that adapts, on-line, to learn routing policies that can sustain high network loads and have low average packet delivery time. These decision makers learn based on the information they get back from their neighboring nodes as they send packets to them (forward exploration similar to Q-Routing) and the information appended to the packets they receive from their neighboring nodes (backward exploration unique to DRQ-Routing). Experiments over several network topologies have shown that at low loads, DRQ-Routing learns the optimal policy more than twice as fast as Q-Routing, and at high loads, it learns routing policies that are more than twice as good as Q-Routing in terms of average packet delivery time. Further, DRQ-Routing is able to sustain higher network loads than Q-Routing and non-adaptive shortest-path routing.
C. H. Dagli, M. Akay, O. Ersoy, B. R. Fernandez and A. Smith, editors, Smart Engineering Systems: Neural Networks, Fuzzy Logic, Data Mining, and Evolutionary Programming, 7, 1997.

Shailesh Kumar Masters Alumni
Risto Miikkulainen Faculty risto [at] cs utexas edu