Confidence Based Dual Reinforcement Q-Routing: An Adaptive On-Line Routing Algorithm (1999)
This paper describes and evaluates the Confidence-based Dual Reinforcement Q-Routing algorithm (CDRQ-Routing) for adaptive packet routing in communication networks. CDRQ-Routing is based on an application of the Q-learning framework to network routing, as first proposed by Littman and Boyan (1993). The main contribution of CDRQ-routing is an increased quantity and an improved quality of exploration. Compared to Q-Routing, the state-of-the-art adaptive Bellman-Ford Routing algorithm, and the non-adaptive shortest path method, CDRQ-Routing learns superior policies significantly faster. Moreover, the overhead due to exploration is shown to be insignificant compared to the improvements achieved, which makes CDRQ-Routing a practical method for real communication networks.
In 16th International Joint Conference on Artificial Intelligence (IJCAI-99), 758--763, Stockholm, Sweden, 1999. San Francisco, CA: Kaufmann.

Shailesh Kumar Masters Alumni
Risto Miikkulainen Faculty risto [at] cs utexas edu