Reasoning and Logic
Posted on May 14, 2009 by Peter Turney
In predicate logic, the concept red ball is represented as a combination of the concepts of redand ball. We can define the predicate RedBall(x) as (Red(x) & Ball(x)). Logical atomism views the world in terms of compound predicates, such as RedBall(x), that are built up from atomicpredicates, such as Red(x) and Ball(x). Good old-fashioned AI (GOFAI) research almost always assumes a kind of logical atomism. Cyc, for example, represents knowledge using a form of logical atomism. Even those researchers who reject GOFAI still tend to assume logical atomism. Statistical and connectionist models of concepts typically view red ball as a combination of red and ball. I believe that we should turn this view on its head. That is, red ball comes first (is more basic, more primitive); red and ball come later (are more complex, more refined).
In Latent Semantic Analysis (LSA), we represent the semantics of red and ball with vectors, in which the elements are derived from the frequencies of the terms red and ball in various contexts. For those who are familiar with logic, it is natural to think of representing red ballby some mathematical operation on the vectors for red and ball. For example, we might add the red vector to the ball vector. One problem with this idea is that addition is not sensitive to order; thus house boat and boat house would have the same vector, although they have different meanings. One solution to this problem is a mathematical operation on vectors that is sensitive to order, such as the tensor product or the circular convolution.
Several papers explore this vector combination approach to representing compound predicates:
- Plate (1991), Holographic reduced representations: Convolution algebra for compositional distributed representations
- Plate (1994), Distributed representations and nested compositional structure
- Smolensky (1994), Grammar-based connectionist approaches to language
- Plate (1995), Estimating analogical similarity by vector dot-products of holographic reduced representations
- Wilson, Street, and Halford (1995), Solving proportional analogy problems using tensor product networks with random representations
- Jones and Mewhort (2007), Representing word meaning and order information in a composite holographic lexicon
- Widdows (2008), Semantic vector products: Some initial investigations
Plate (1995) describes his approach as follows:
Suppose we have distributed representations for the concept “circle”, “triangle”, “small”, and “large”. We can represent a small circle by superposing the patterns for “small” and “circle”. However, when we try to represent a small circle and large triangle we have the problem that the superposition of the four patterns is ambiguous – the information that small is associated with triangle and large associated with circle is lost – it could be a large circle and a small triangle. The same problem arises when we try to represent conceptual relations. Suppose we have a predicate representation for “Spot bit Jane”: bite(spot, jane). Spot is the agent of this relation, and Jane is the object (or patient). A distributed representation of this relation must be careful to preserve the information about which person is associated with which role (agent or object) so that there is no confusion with “Jane bit Spot”. I refer to associations between roles and fillers as role/filler bindings.
The assumption here is that we build up the complex structure bite(spot, jane) from simple atomic elements bite, spot, and jane. But I suggest that the atomic element is bite(spot, jane), and that we construct bite, spot, and jane from this atomic element.
Suppose we have a third-order tensor of the form pattern × word × word. The predicate P(x,y) is a pattern (“x bit y“), x is a word (“spot”), y is another word (“jane”), and the triple <P(x,y),x,y> is a cell (<”x bit y“,”spot”,”jane”>) in this third-order tensor. The concept “bite” corresponds to the slice (a matrix cut out of the tensor) <”x bit y“,*,*>, the concept “spot” is the slice <*,”spot”,*>, and the concept “jane” is the slice <*,*,”jane”>.
In this representation, the atomic elements (tensor cells; scalars) are whole events (“I see a red ball”, “Spot bit Jane”), and abstract concepts (“red”, “bite”) are complicated structures (tensor slices; matrices), composed of these atomic elements. This turns the usual picture upside-down. There is nothing “simple” or “atomic” about the concept “circle”. It is a complex structure, composed of all of our atomic experiences of events that contained some aspect of circularity.
Using a vector combination approach, Jones and Mewhort (2007) report a score of 57.81% (see page 18) on multiple-choice synonym questions from the TOEFL (Test of English as a Foreign Language). Using the opposite approach (Turney, 2007), a third-order pattern × word × word tensor achieves a score of 83.75% (see page 22).