Grounding Language in Descriptions of Scenes (2006)
The problem of how abstract symbols, such as those in systems of natural language, may be grounded in perceptual information presents a significant challenge to several areas of research. This paper presents the GLIDES model, a neural network architecture that shows how this symbol-grounding problem can be solved through learned relationships between simple visual scenes and linguistic descriptions. Unlike previous models of symbol grounding, the model's learning is completely unsupervised, utilizing the principles of self organization and Hebbian learning and allowing direct visualization of how concepts are formed and grounding occurs. Two sets of experiments were conducted to evaluate the model. In the first set, linguistic test stimuli were presented and the scenes that were generated by the model were evaluated as the grounding of the language. In the second set, the model was presented with visual test samples and its language generation capabilities based on the grounded representations were assessed. The results demonstrate that symbols can be grounded based on associations of perceptual and linguistic representations, and the grounding can be made transparent. This transparency leads to unique insights into symbol grounding, including how many-to-many mappings between symbols and referents can be maintained and how concepts can be formed from cooccurrence relationships.
In Proceedings of the 28th Annual Meeting of the Cognitive Science Society, 2006.

Risto Miikkulainen Faculty risto [at] cs utexas edu
Paul Williams Undergraduate Alumni pwilly [at] cs utexas edu