Handwritten Digit Recognition Utilizing Evolved Pattern GeneratorsVinod Valsalam, James A. Bednar, and Risto Miikkulainen |
Figure 1: The architecture of the competitive learning network. The binary activations from the 8×8 input pattern consisting of 64 pixels are fed to the input units of the network, which also contains a bias unit. The 10 output units each correspond to a classification of the input as one of the 10 digits; the one with the highest activation is chosen as the answer of the network. During training, the weights of this unit are adjusted towards the input pattern, making that unit more likely to win similar patterns in the future.
Figure 3: Random weights of each output unit of the initial network. The weights are arranged in an 8×8 grid corresponding to the pixels in the input image. Lighter squares represent stronger weights. A digit on top indicates that this unit wins a large number of examples of that digit. The assignment of digits to units is uneven, indicating that this network is a poor classifier.
Figure 4: Final weights for each output unit without prenatal biasing. Most of the weights have converged to a configuration that imitates the input digit patterns; however, some units represent a combination of digits (e.g. 7 and 9). This result demonstrates how competitive learning can get stuck in a local optimum when it does not start with an appropriate initial bias.
Figure 5: Patterns produced by the evolved pattern generator. The pattern generator consists of a set of oriented Gaussian patterns, each with a probability of generation shown on top of the pattern. These patterns tend to be simple and have no direct resemblance to digits. The weights resulting from prenatal training with such patterns are shown in Fig. 6(a). This biased network is then trained with digit examples to get the final recognition network (Fig. 6(b)). The classification performance of the final network is used as the fitness of the pattern generator in evolution.
[(a) Weights after prenatal training]
[(b) Final weights]
Figure 6: Weights for each output unit, trained prenatally with patterns shown in Fig. 5. Comparing the random weights network in Fig. 3 and weights after prenatal training (a), it is clear that only five of the ten units learn a significant bias. Yet, these biases are sufficient for postnatal training to perform better than without prenatal biasing — all digits are represented well by the final weight patterns (b).
Digits 7 and 9 have several pixels in common, which results in the same unit learning both digits when the network is not biased. On the other hand, when the network is biased with the generated patterns, it is able to learn non-overlapping categories of 7 and 9. Prenatal training establishes a general bias on one of the units that matches several digits including 7 and 9, while another unit picks up a slight bias for 7. During postnatal learning, these biases allow the second unit to slowly become more and more specialized to digit 7, thus winning examples of that digit from the first unit. At the same time, the first unit keeps examples of digit 9 from interfering with the learning of digit 7 by the second unit. In the end, only digit 9 remains mapped to the first unit, while the other digits have been learned by other units in the network.
Prenatal biasing Training without prenatal biasing Training after prenatal biasing
This document was translated from L^{A}T_{E}X by H^{E}V^{E}A.