Figure 4: Canonical and non-canonical views of
a 3D object. The dependence on viewpoint of the performance
of human observers required to recognize such objects must be
accounted for by any model of recognition.
To understand the need for building structure into a multiple-view model of recognition, one may recall two basic findings of the relevant psychophysical studies: the existence of canonical views both for everyday and for novel objects (Figure 4), and the common patterns in the dependence of recognition performance on the viewpoint (Figure 5). The notion of a canonical view was introduced by Palmer, Rosch and Chase, who found that certain views of familiar objects were recognized consistently easier and faster than randomly chosen views of the same objects [29]. Just as in mental rotation experiments gradual and monotonic change in viewpoint precipitates an equally gradual and monotonic change in the performance [38,39], the error rate and the response time for a test view of a 3D shape was found to grow monotonically with misorientation relative to a canonical view [44]. This dependence, however, was shown to weaken with repeated testing [11]. The simple model described below replicates these basic characteristics of mental rotation in recognition, by imposing a certain quasi-sequential structure on the collection of units, each of which represents a particular view of the object.
The model, which will be called here NMR (short for No Mental Rotation), is self-organizing, in that it learns to represent a 3D object from examples [13]. The basic operating cycle of NMR is as follows:
The simple version of the algorithm, described in [13], operates under a limitation on the total number of units that can be recruited, and has no provisions for ``unlearning'' a representation, or freeing inactive units. That simple version is capable, nevertheless, of reproducing the two basic empirical findings in the study of mental rotation: the dependence of performance (as measured by response time) on the angular distance to a canonical view, and the disappearance of that dependence with practice (Figure 5). Lateral connections play a crucial role in both these traits of the model's performance.
Figure 5: ``Mental rotation'' and its
disappearance with practice. The time required to recognize
an object presented at a certain view depends on the
misorientation of that view relative to a canonical one. This
phenomenon resembles a similar dependence observed in experiments
involving comparison between two simultaneously presented images
[38]. Following practice or repeated testing,
the response times become essentially uniform for all tested
views.
Figure 6: A network
implementing multiple-view representation. Both the initial
semblance of ``mental rotation'' and its disappearance with
practice can be replicated by a model based on lateral links
between view-specific representation units in a network trained to
recognize the object (see the Emergence of ... section).
Initially, the network of lateral connections between units representing individual views constitutes the medium over which activation spreads between the different units (this happens when one of them becomes activated following the exposure of the model to a particular test view of the target object; see Figure 6). The monotonic dependence of response time on viewpoint stems from the ``serial'' structure of the lateral connections (instilled by the natural order of presentation of the individual views, corresponding to the order of their appearance during a rotation of the object).
Subsequently, additional links are added to the initial ``linear'' pattern, creating shortcuts that lead to a faster and more uniform activation of the entire structure of specific-view units (Figure 7). Because of these shortcuts, the response time (modeled by the time it takes the entire network to reach a certain level of activation) becomes generally shorter, and progressively less dependent on the identity of the input view (that is, of the locus of the initial activation of the network).
Figure 7: The same network, with
shortcuts introduced by repeated exposure. The shortcuts
obliterate the initially sequential structure of the lateral links
in the network, leading to uniform response times for the
different views.
According to the NMR model, mental rotation is merely a byproduct of a mechanism geared to create associations between representations of certain well-defined entities (in the present case, between the representations of object views), provided that the appearance of these entities follows one of the laws of association known since Aristotle. Thus, in the case of the NMR model, the postulation of lateral connections contributes to the parsimony of the modeling process, by reducing a well-known but easily misunderstood phenomenon in visual psychophysics to an equally well-known phenomenon in general cognition, namely, associative learning, the substrate for which is widely agreed to exist in the form of learned associations.