Evolving Deep LSTM (2015)
Author: Aditya Rawal
Reinforcement Learning agents with memory are constructed by extending neuroevolutionary algorithm NEAT to incorporate special memory units with gating logic (LSTM memory cell). Initial evaluation on POMDP tasks indicated that memory solutions obtained by evolving LSTMs outperform traditional RNNs. Scaling neuroevolution of LSTM to deep memory problems is challenging because: (1) the fitness landscape is deceptive (2) large number of associated parameters need to be optimized. To overcome these challenges, a new secondary optimization objective is introduced that maximizes the information (info-max) stored in the LSTM network. The network training is split into two phases. In the first phase (unsupervised phase), independent memory modules are evolved by optimizing for the info-max objective. In the second phase, the networks are trained by optimizing the task fitness. Results on two different memory tasks indicate that neuroevolution can discover powerful LSTM-based memory solution that outperform traditional RNNs.
Aditya Rawal Ph.D. Student aditya [at] cs utexas edu
Evolving Deep LSTM-based Memory networks using an Information Maximization Objective Aditya Rawal and Risto Miikkulainen To Appear In Genetic and Evolutionary Computation Conference (GECCO 2016), Colorado, USA, 201... 2016