Confidence Based Dual Reinforcement Q-Routing: An On-Line Adaptive Network Routing Algorithm (1998)
Efficient routing of information packets in dynamically changing communication networks requires that as the load levels, traffic patterns and topology of the network change, the routing policy also adapts. Making globally optimal routing decisions would require a central observer/controller with complete information about the state of all nodes and links in the network, which is not realistic. Therefore, the routing decisions must be made locally by individual nodes (routers) using only local routing information. The routing information at a node could be estimates of packet delivery time to other nodes via its neighbors or estimates of queue lengths of other nodes in the network. An adaptive routing algorithm should efficiently explore and update routing information available at different nodes as it routes packets. It should continuously evolve efficient routing policies with minimum overhead on network resources. In this thesis, an on-line adaptive network routing algorithm called Confidence-based Dual Reinforcement Q-Routing (CDRQ-routing), based on the Q learning framework, is proposed and evaluated. In this framework, the routing information at individual nodes is maintained as Q value estimates of how long it will take to send a packet to any particular destination via each of the node's neighbors. These Q values are updated through exploration as the packets are transmitted. The main contribution of this work is the faster adaptation and the improved quality of routing policies over the Q-Routing. The improvement is based on two ideas. First, the quality of exploration is improved by including a confidence measure with each Q value representing how reliable the Q value is. The learning rate is a function of these confidence values. Secondly, the quantity of exploration is increased by including backward exploration into Q learning. As a packet hops from one node to another, it not only updates a Q value in the sending node (forward exploration similar to Q-Routing), but also updates a Q value in the receiving node using the information appended to the packet when it is sent out (backward exploration). Thus two Q value updates per packet hop occur in CDRQ-Routing as against only one in Q-routing. Certain properties of forward and backward exploration that form the basis of these update rules are stated and proved in this work. Experiments over several network topologies, including a 36 node irregular grid and 128 node 7-D hypercube, indicate that the improvement in quality and increase in quantity of exploration contribute in complementary ways to the performance of the overall routing algorithm. CDRQ-Routing was able to learn optimal shortest path routing at low loads and efficient routing policies at medium loads almost twice as fast as Q-Routing. At high load levels, the routing policy learned by CDRQ-Routing was twice as good as that learned by Q-Routing in terms of average packet delivery time. CDRQ-Routing was found to adapt significantly faster than Q-Routing to changes in network traffic patterns and network topology. The final routing policies learned by CDRQ-Routing were able to sustain much higher load levels than those learned by Q-Routing. Analysis shows that exploration overhead incurred in CDRQ-Routing is less than 0.5% of the packet traffic. Various extensions of CDRQ-Routing namely, routing in heterogeneous networks (different link delays and router processing speeds), routing with adaptive congestion control (in case of finite queue buffers) and including predictive features into CDRQ-Routing have been proposed as future work. CDRQ-Routing is much superior and realistic than the state of the art distance vector routing and the Q-Routing algorithm.
Masters Thesis, Department of Computer Sciences, the University of Texas at Austin., May 1998. 108. Technical Report AI-98-267.

Shailesh Kumar Masters Alumni