Emergence of Structured Multiple-View Representations of 3D Objects

Next: Physiology of Association Up: Lateral connections in Previous: Lateral connections in

Emergence of Structured Multiple-View Representations of 3D Objects

As we have seen in the preceding section, the pairing of RFs that participate in a multiple-view representation of a 3D object leads to improved invariance of the representation in the face of changes in the object's pose. This means that the representation of a given view may be used over a wider range of viewpoints centered around that view. Obviously, the paired-RF representation delays the onset of the problem associated with viewpoint-dependent appearance of objects, and does not solve it completely. Thus, a system based on view-specific representation that starts by storing a single view of a novel object eventually must add more views to its representation of that object. The present section, following [13], concentrates on the relationship between successive views of an object that undergoes rotation in 3D, and on the way to put this relationship to use in forming a multiple-view representation of the object. In this, it complements the work of [30], who considered the utility of a multiple-view representation, irrespective of the structure that may be imposed on the different views stored in the system.

Figure 4: Canonical and non-canonical views of a 3D object. The dependence on viewpoint of the performance of human observers required to recognize such objects must be accounted for by any model of recognition.

To understand the need for building structure into a multiple-view model of recognition, one may recall two basic findings of the relevant psychophysical studies: the existence of canonical views both for everyday and for novel objects (Figure 4), and the common patterns in the dependence of recognition performance on the viewpoint (Figure 5). The notion of a canonical view was introduced by Palmer, Rosch and Chase, who found that certain views of familiar objects were recognized consistently easier and faster than randomly chosen views of the same objects [29]. Just as in mental rotation experiments gradual and monotonic change in viewpoint precipitates an equally gradual and monotonic change in the performance [38,39], the error rate and the response time for a test view of a 3D shape was found to grow monotonically with misorientation relative to a canonical view [44]. This dependence, however, was shown to weaken with repeated testing [11]. The simple model described below replicates these basic characteristics of mental rotation in recognition, by imposing a certain quasi-sequential structure on the collection of units, each of which represents a particular view of the object.

The model, which will be called here NMR (short for No Mental Rotation), is self-organizing, in that it learns to represent a 3D object from examples [13]. The basic operating cycle of NMR is as follows:

Accept a view of a 3D object (fixed throughout the learning procedure);
If the new view is sufficiently different from any of the views stored in the system:
1. Store the view;
2. Create a (lateral) link between the newly stored view and the previously activated one;
Else:
1. Activate the view that best matches the input;
2. Strengthen the (lateral) link between the current and the previous active views.

The simple version of the algorithm, described in [13], operates under a limitation on the total number of units that can be recruited, and has no provisions for ``unlearning'' a representation, or freeing inactive units. That simple version is capable, nevertheless, of reproducing the two basic empirical findings in the study of mental rotation: the dependence of performance (as measured by response time) on the angular distance to a canonical view, and the disappearance of that dependence with practice (Figure 5). Lateral connections play a crucial role in both these traits of the model's performance.

Figure 5: ``Mental rotation'' and its disappearance with practice. The time required to recognize an object presented at a certain view depends on the misorientation of that view relative to a canonical one. This phenomenon resembles a similar dependence observed in experiments involving comparison between two simultaneously presented images [38]. Following practice or repeated testing, the response times become essentially uniform for all tested views.

Figure 6: A network implementing multiple-view representation. Both the initial semblance of ``mental rotation'' and its disappearance with practice can be replicated by a model based on lateral links between view-specific representation units in a network trained to recognize the object (see the Emergence of ... section).

Initially, the network of lateral connections between units representing individual views constitutes the medium over which activation spreads between the different units (this happens when one of them becomes activated following the exposure of the model to a particular test view of the target object; see Figure 6). The monotonic dependence of response time on viewpoint stems from the ``serial'' structure of the lateral connections (instilled by the natural order of presentation of the individual views, corresponding to the order of their appearance during a rotation of the object).

Subsequently, additional links are added to the initial ``linear'' pattern, creating shortcuts that lead to a faster and more uniform activation of the entire structure of specific-view units (Figure 7). Because of these shortcuts, the response time (modeled by the time it takes the entire network to reach a certain level of activation) becomes generally shorter, and progressively less dependent on the identity of the input view (that is, of the locus of the initial activation of the network).

Figure 7: The same network, with shortcuts introduced by repeated exposure. The shortcuts obliterate the initially sequential structure of the lateral links in the network, leading to uniform response times for the different views.

According to the NMR model, mental rotation is merely a byproduct of a mechanism geared to create associations between representations of certain well-defined entities (in the present case, between the representations of object views), provided that the appearance of these entities follows one of the laws of association known since Aristotle. Thus, in the case of the NMR model, the postulation of lateral connections contributes to the parsimony of the modeling process, by reducing a well-known but easily misunderstood phenomenon in visual psychophysics to an equally well-known phenomenon in general cognition, namely, associative learning, the substrate for which is widely agreed to exist in the form of learned associations.

Next: Physiology of Association Up: Lateral connections in Previous: Lateral connections in