[Paper presented at Institute of Electrical Engineers Colloquium on "Grounding Representations: Integration of Sensory Information in Natural Language Processing, Artificial Intelligence and Neural Networks," London May 15 1995.]

GROUNDING SYMBOLS IN SENSORIMOTOR CATEGORIES WITH NEURAL NETWORKS

Stevan Harnad
Cognitive Sciences Centre
Department of Psychology
University of Southampton
Highfield, Southampton
SO17 1BJ UNITED KINGDOM
harnad@ecs.soton.ac.uk
phone: +44 01703 592582
fax: +44 01703 594597
http://cogsci.ecs.soton.ac.uk/~harnad/
ftp://cogsci.ecs.soton.ac.uk/pub/harnad/

It is unlikely that the systematic, compositional properties of formal symbol systems -- i.e., of computation -- play no role at all in cognition. However, it is equally unlikely that cognition is just computation, because of the symbol grounding problem (Harnad 1990): The symbols in a symbol system are systematically interpretable, by external interpreters, as meaning something, and that is a remarkable and powerful property of symbol systems. Cognition (i.e., thinking), has this property too: Our thoughts are systematically interpretable by external interpreters as meaning something. However, unlike symbols in symbol systems, thoughts mean what they mean autonomously: Their meaning does not consist of or depend on anyone making or being able to make any external interpretations of them at all. When I think "the cat is on the mat," the meaning of that thought is autonomous; it does not depend on YOUR being able to interpret it as meaning that (even though you could interpret it that way, and you would be right).

So the symbol grounding problem is: How can you ground the meanings of symbols autonomously, i.e., without the mediation of external interpretation? One solution to the symbol grounding problem is to abandon symbol systems altogether, in favour of noncomputational systems (analog systems, parallel/distributed systems), but the systematic, compositional character of symbol systems seems worth retaining, because thought does seem to have that character, in part, if not wholly (Harnad 199b). The alternative to abandoning symbols altogether is to abandon pure symbol systems for hybrid symbolic/nonsymbolic systems in which the symbol-meaning connection is not interpretation-dependent but autonomous and direct. It is such an approach that I would like to describe here.

A system whose symbols are directly connected to their referents must be able to categorise: It must be able to sort and name the objects its symbols refer to. Once the system can do that, the names -- elementary symbols grounded directly in their referents -- can enter into higher-order combinations. For the higher-order symbol combinations to inherit the grounding of the symbols of which they are composed, the grounding must exert a second kind of constraint, over and above (or rather, under and below) the usual kind of constraint on symbol systems.

In pure symbol systems, the only constraint on their recombination is syntactic. Syntactic rules operate only on the shapes of the symbols, and those shapes are arbitrary (in the sense that they have nothing to do with what the symbols means: they neither resemble their referents nor are they causally connected to them). It is for this reason -- the arbitrariness of the shapes of symbols and the purely shape-based nature of the rules for manipulating them -- that pure symbol systems have the property of implementation-dependence, the independence of the software and hardware levels.

A grounded symbol system, in contrast, would lose this implementation-independence in exchange for its interpretation-independence. The reason is that the causal connection between the symbols and their referents would be a violation of the purely syntactic constraints that are the only permissible ones in a pure symbol system.

Let us look more closely at the possible "shape" that these grounding constraints might take in a hybrid autonomous system: To have any contact with distal objects at all, the system would have to have transducer surfaces and effectors, analogous to our sensory surfaces and motor end-organs. So objects could cast their "shadows" on the system's transducer surfaces, and from these proximal projections of distal objects, the system would have to be able to sort and name those objects. So let us suppose that the features in the sensory projection that allow the system to correctly categorise the objects of which they are the projections are found by some sort of a learning mechanism -- let it be a neural net that takes sensory projections as input and gives category names as output: The name would be connected to the category of objects it denotes by a physical link including the neural net and the sensory projection of the objects. That would be the grounding, and the composite shape of the arbitrary name plus the net plus the sensory projection would no longer be arbitrary. If the combinations into which the name could enter were governed not only by syntactic rules operating on the arbitrary shape of the name alone, but also by the nonarbitrary shape of its grounding, then we would have the kind of doubly constrained hybrid system, with heritable grounding, suggested above.

The dynamics and perhaps also the formal properties of such hybrid systems will have to be worked out by theory and experiment. Neural nets can do simple categorisation (Harnad et al. 1991 1995), but it still remains to embed them in autonomous sensorimotor systems that also have compositional capacity, to see how the two sets of constraints interact. One clue about this interaction may come from human categorical perception (CP). It is known that the sensory system sometimes distorts the shape of incoming stimulation in the service of categorisation: Color categories look qualitatively distinct, even though they are actually parts of a smooth hue continuum: greens look green and blues look blue, apart from the uncertain area in between; they do not look like continuous variations of the same hue. The same is true of acoustic pitches, which are perceived as discrete, qualitative semitone categories even though in reality frequency varies continuously. Inborn feature detectors seem to be responsible for this discretisation and categorisation (Harnad 1987), but what about the majority of our categories, which are learned rather than inborn?

There is now evidence that CP effects -- compression within categories and separation between -- arise in the course learning to categorise: Goldstone (1994) presented stimuli varying continuously in two dimensions -- size and brightness -- to subjects and used signal detection analysis to measure the discriminability of tiny differences between them. He found that differences became less discriminable within categories and in some cases also more discriminable between categories. We (Andrews, Livingston & Harnad, in preparation) have done similar experiments on a variety of stimuli: synthetic cell-like shapes varying continuously in several features, male and female newborn chicks, computer generated textures with and without certain features, and multiple photographs of a pair of identical twins. In each case, subjects were trained to categorise, and their pairwise similarity ratings for the stimuli before and after category learning were compared. Within-category differences were compressed and between-category differences were enhanced as a result of category learning. When the pairwise similarities were analysed using multidimensional scaling, their locations in similarity space "migrated," from initial ones with random overlaps to partitioned ones, in which category members were compressed together and separated from other categories.

What can be the functional role of these CP effects? It is quite understandable if we tend to put things in the category because they look more alike, but why should they come to look more alike because we put them in the same category? The answer may be that we must not take for granted our capacity to categorise correctly. The stimuli were used were all difficult. It required a lot of training to learn to sort and name the correctly. It might be that the main way we are able to put things in the right category is because they look alike; and if they don't look as if they belong in the same category initially, yet that is how we must learn to sort them, then what we learn to do is to see them as looking more alike.

That is what happened in our simple neural net models: We used backpropagation nets to learn first to autoassociate and then to categorise a series of lines of different lengths. For auto-association, they had to produce as output a line identical in length to the input. This is not categorisation but matching, a relative judgment. Once the nets had been trained to autoassociate, we checked the pairwise Euclidean distance between the lines in hidden unit activation space: There were three hidden units, so each line was a point in 3-space.

Next we trained the nets to categorise, dividing the lines into three categories: short, medium and long. Once they had learned to do so, we again measured the pairwise distances in hidden-unit space, and found that the within-category distances had become compressed and the between-category distances had become enhanced. If we displayed the lines in terms of the "receptive fields" of each of the hidden units, then we found that these receptive fields were "deformed" by the canstraint of having to categorise as well as autoassociate (i.e., preserve the sensory projection and categorise it as well). Viewed as trajectories in the unit cube of hidden-unit activations, the hidden-unit representations of each of the lines "migrated" until they reached locations where the cube could be partitioned into the three categories (short, medium, long), much as in the human multidimensional scaling analysis.

This is a trivial result for a gradient-descent network, but it is an interesting potential explanation for CP effects in human beings: Perhaps internal representations are being subtly deformed in the service of categorisation. Could this also be the locus of the nonarbitrary shape-dependency in a hybrid nonsymbolic/symbolic system? We are now testing the generality of the CP effect in other kinds of Nets and trying to implement it in hybrid systems.

REFERENCES

Goldstone, Robert (1994) Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 1994 Jun, v123 (n2):178-200.

Harnad, S. (ed.) (1987) Categorical Perception: The Groundwork of Cognition. New York: Cambridge University Press.

Harnad, S. (1990a) The Symbol Grounding Problem. Physica D 42: 335-346. [Reprinted in Hungarian Translation as "A Szimbolum-Lehorgonyzas Problemaja." Magyar Pszichologiai Szemle XLVIII-XLIX (32-33) 5-6: 365-383.]

Harnad, S. (1990b) Symbols and Nets: Cooperation vs. Competition. Review of: S. Pinker and J. Mehler (Eds.) (1988) Connections and Symbols Connection Science 2: 257-260.

Harnad, S. (1992) Connecting Object to Symbol in Modeling Cognition. In: A. Clarke and R. Lutz (Eds) Connectionism in Context Springer Verlag.

Harnad, S. (1993) Grounding Symbols in the Analog World with Neural Nets. Think 2(1) 12 - 78 (Special issue on "Connectionism versus Symbolism," D.M.W. Powers & P.A. Flach, eds.). [Also reprinted in French translation as: "L'Ancrage des Symboles dans le Monde Analogique a l'aide de Reseaux Neuronaux: un Modele Hybride." In: Rialle V. et Payette D. (Eds) La Modelisation. LEKTON, Vol IV, No 2.]

Harnad, S. (1993) Symbol Grounding is an Empirical Problem: Neural Nets are Just a Candidate Component. Proceedings of the Fifteenth Annual Meeting of the Cognitive Science Society. NJ: Erlbaum

Harnad, S. (1994) Computation Is Just Interpretable Symbol Manipulation: Cognition Isn't. Special Issue on "What Is Computation" Minds and Machines 4:379-390

Harnad, S. (1995) Grounding Symbolic Capacity in Robotic Capacity. In: Steels, L. and R. Brooks (eds.) The "artificial life" route to "artificial intelligence." Building Situated Embodied Agents. New Haven: Lawrence Erlbaum

Harnad, S., Hanson, S.J. & Lubin, J. (1991) Categorical Perception and the Evolution of Supervised Learning in Neural Nets. In: Working Papers of the AAAI Spring Symposium on Machine Learning of Natural Language and Ontology (DW Powers & L Reeker, Eds.) pp. 65-74. Presented at Symposium on Symbol Grounding: Problems and Practice, Stanford University, March 1991; also reprinted as Document D91-09, Deutsches Forschungszentrum fur Kuenstliche Intelligenz GmbH Kaiserslautern FRG.

Harnad, S. Hanson, S.J. & Lubin, J. (1995) Learned Categorical Perception in Neural Nets: Implications for Symbol Grounding. In: V. Honavar & L. Uhr (eds) Symbol Processors and Connectionist Network Models in Artificial Intelligence and Cognitive Modelling: Steps Toward Principled Integration. pp. 191-206. Acadamic Press.