Posted on June 28, 2009 by Peter Turney
If symbols must be grounded in perception, how does this grounding happen? How do we learn to create mappings between language and perception? For example, how does the word “rabbit” get tied to the perception (visual, tactile, whatever) of a rabbit? AI algorithms for assigning textual labels to photographs are not yet able to approach human performance on this task. The problem is somewhat similar to statistical machine translation, which exploits parallel corpora to learn mappings between two different languages, although the difference between text and photographs is more extreme than the difference between any two written languages. Perhaps ideas from statistical machine translation are applicable to symbol grounding. The translation algorithm of Lepage and Denoual, based on proportional analogy, seems particularly appropriate, since it makes minimal assumptions about the structures of the languages.
There is a view that the meaning of words must be grounded in perception:
But how do we connect a word with a perception? Quine gives the following example: suppose we hear a person say “gavagai” in the presence of a rabbit. How do we know that “gavagai” means “rabbit”? Other possible interpretations of “gavagai” are “Lo, food”, “Let’s go hunting”, and “There will be a storm tonight”. The possibilities are endless.
It is interesting that Quine links the symbol grounding problem to the problem of translation. In Quine’s thought experiment, a linguist must translate “gavagai” into English. Perhaps recent progress with machine translation algorithms is applicable to the symbol grounding problem?
Lepage and Denoual use proportional analogies to derive translations from parallel corpora. A proportional analogy has the form A:B::C:D, meaning “A is to B as C is to D“. For example, quart:volume::mile:distance means “quart is to volume as mile is to distance”. Consider the following proportional analogy:
A = “This is a rabbit.”
B = “This is a fox.”
The analogy A:B::C:D helps us to map “rabbit” to the image of the rabbit, whereas A and C alone leave the mapping indeterminate. Comparing A and B, we see the shared structure “This is a X.” and we note that X = “rabbit” for A and X = “fox” for B. Likewise, comparing C and D, we see the shared backgrounds and note the differing foregrounds. This helps us map the foreground of C to “rabbit” and the foreground of D to “fox”. Given only A and C, we have no reason to pick out “rabbit” from the sentence “This is a rabbit” in A and we have no reason to pick out the foreground rabbit from the background grass in C.
The core idea here is that we do not ground symbols in perceptions by noting correlations between symbols and perceptions; rather, we note meta correlations between relations between symbols (e.g., the relation between A and B) and relations between perceptions (e.g., the relation between C and D).
Related work: Deb Roy.