<i>
[Paper presented at Institute of Electrical Engineers Colloquium
on "Grounding Representations: Integration of Sensory Information in
Natural Language Processing, Artificial Intelligence and Neural
Networks," London May 15 1995.]
</i>
<p>
<center>
<b>
GROUNDING SYMBOLS IN SENSORIMOTOR CATEGORIES WITH NEURAL NETWORKS
</b>
<p>
Stevan Harnad
<br>
Cognitive Sciences Centre
<br>
Department of Psychology
<br>
University of Southampton
<br>
Highfield, Southampton
<br>
SO17 1BJ UNITED KINGDOM
<br>
harnad@ecs.soton.ac.uk 
<br>
phone: +44 01703 592582
<br>
fax:   +44 01703 594597
<br>
http://cogsci.ecs.soton.ac.uk/~harnad/
<br>
ftp://cogsci.ecs.soton.ac.uk/pub/harnad/
</center>
<p>
It is unlikely that the systematic, compositional properties of formal
symbol systems -- i.e., of computation -- play no role at all in
cognition. However, it is equally unlikely that cognition is just
computation, because of the symbol grounding problem (Harnad 1990):  The
symbols in a symbol system are systematically interpretable, by
external interpreters, as meaning something, and that is a remarkable
and powerful property of symbol systems. Cognition (i.e., thinking),
has this property too: Our thoughts are systematically interpretable by
external interpreters as meaning something. However, unlike symbols in
symbol systems, thoughts mean what they mean autonomously: Their
meaning does not consist of or depend on anyone making or being able to
make any external interpretations of them at all. When I think "the cat
is on the mat," the meaning of that thought is autonomous; it does not
depend on YOUR being able to interpret it as meaning that (even though
you could interpret it that way, and you would be right).
<p>
So the symbol grounding problem is: How can you ground the meanings of
symbols autonomously, i.e., without the mediation of external
interpretation? One solution to the symbol grounding problem is to
abandon symbol systems altogether, in favour of noncomputational
systems (analog systems, parallel/distributed systems), but the
systematic, compositional character of symbol systems seems worth
retaining, because thought does seem to have that character, in part,
if not wholly (Harnad 199b). The alternative to abandoning symbols altogether
is to abandon pure symbol systems for hybrid symbolic/nonsymbolic
systems in which the symbol-meaning connection is not
interpretation-dependent but autonomous and direct. It is such an
approach that I would like to describe here.
<p>
A system whose symbols are directly connected to their referents must
be able to categorise: It must be able to sort and name the objects its
symbols refer to. Once the system can do that, the names -- elementary
symbols grounded directly in their referents -- can enter into
higher-order combinations. For the higher-order symbol combinations to
inherit the grounding of the symbols of which they are composed, the
grounding must exert a second kind of constraint, over and above (or
rather, under and below) the usual kind of constraint on symbol
systems.
<p>
In pure symbol systems, the only constraint on their recombination is
syntactic. Syntactic rules operate only on the shapes of the symbols,
and those shapes are arbitrary (in the sense that they have nothing to
do with what the symbols means: they neither resemble their referents
nor are they causally connected to them). It is for this reason -- the
arbitrariness of the shapes of symbols and the purely shape-based
nature of the rules for manipulating them -- that pure symbol systems have
the property of implementation-dependence, the independence of the
software and hardware levels.
<p>
A grounded symbol system, in contrast, would lose this
implementation-independence in exchange for its
interpretation-independence. The reason is that the causal connection
between the symbols and their referents would be a violation of the
purely syntactic constraints that are the only permissible ones in a pure
symbol system.
<p>
Let us look more closely at the possible "shape" that these grounding
constraints might take in a hybrid autonomous system: To have any
contact with distal objects at all, the system would have to have
transducer surfaces and effectors, analogous to our sensory surfaces
and motor end-organs. So objects could cast their "shadows" on the
system's transducer surfaces, and from these proximal projections of
distal objects, the system would have to be able to sort and name those
objects. So let us suppose that the features in the sensory projection
that allow the system to correctly categorise the objects of which they
are the projections are found by some sort of a learning mechanism --
let it be a neural net that takes sensory projections as input and gives
category names as output: The name would be connected to the category of
objects it denotes by a physical link including the neural net and the
sensory projection of the objects. That would be the grounding, and the
composite shape of the arbitrary name plus the net plus the sensory
projection would no longer be arbitrary. If the combinations into which the
name could enter were governed not only by syntactic rules operating on
the arbitrary shape of the name alone, but also by the
nonarbitrary shape of its grounding, then we would have the kind of
doubly constrained hybrid system, with heritable grounding, suggested
above.
<p>
The dynamics and perhaps also the formal properties of such hybrid
systems will have to be worked out by theory and experiment. Neural
nets can do simple categorisation (Harnad et al. 1991 1995), but it
still remains to embed them in autonomous sensorimotor systems that
also have compositional capacity, to see how the two sets of
constraints interact. One clue about this interaction may come from
human categorical perception (CP). It is known that the sensory system
sometimes distorts the shape of incoming stimulation in the service of
categorisation: Color categories look qualitatively distinct, even
though they are actually parts of a smooth hue continuum: greens look
green and blues look blue, apart from the uncertain area in between;
they do not look like continuous variations of the same hue. The same
is true of acoustic pitches, which are perceived as discrete,
qualitative semitone categories even though in reality frequency varies
continuously. Inborn feature detectors seem to be responsible for this
discretisation and categorisation (Harnad 1987), but what about the
majority of our categories, which are learned rather than inborn?
<p>
There is now evidence that CP effects  -- compression within categories
and separation between -- arise in the course learning to categorise:
Goldstone (1994) presented stimuli varying continuously in two
dimensions -- size and brightness -- to subjects and used signal
detection analysis to measure the discriminability of tiny differences
between them. He found that differences became less discriminable
within categories and in some cases also more discriminable between
categories. We (Andrews, Livingston & Harnad, in preparation) have done
similar experiments on a variety of stimuli: synthetic cell-like
shapes varying continuously in several features, male and female
newborn chicks, computer generated textures with and without certain
features, and multiple photographs of a pair of identical twins. In each
case, subjects were trained to categorise, and their pairwise similarity
ratings for the stimuli before and after category learning were
compared. Within-category differences were compressed and
between-category differences were enhanced as a result of category
learning. When the pairwise similarities were analysed using
multidimensional scaling, their locations in similarity space
"migrated," from initial ones with random overlaps to partitioned ones,
in which category members were compressed together and separated from
other categories.
<p>
What can be the functional role of these CP effects? It is quite
understandable if we tend to put things in the category because they
look more alike, but why should they come to look more alike because we
put them in the same category? The answer may be that we must not take
for granted our capacity to categorise correctly. The stimuli were used
were all difficult. It required a lot of training to learn to sort and
name the correctly. It might be that the main way we are able to put
things in the right category is because they look alike; and if they
don't look as if they belong in the same category initially, yet that is
how we must learn to sort them, then what we learn to do is to see them
as looking more alike.
<p>
That is what happened in our simple neural net models: We used
backpropagation nets to learn first to autoassociate and then to
categorise a series of lines of different lengths. For auto-association,
they had to produce as output a line identical in length to the input.
This is not categorisation but matching, a relative judgment. Once the
nets had been trained to autoassociate, we checked the pairwise
Euclidean distance between the lines in hidden unit activation space:
There were three hidden units, so each line was a point in 3-space.
<p>
Next we trained the nets to categorise, dividing the lines into three
categories: short, medium and long. Once they had learned to do
so, we again measured the pairwise distances in hidden-unit space, and
found that the within-category distances had become compressed and the
between-category distances had become enhanced. If we displayed the
lines in terms of the "receptive fields" of each of the hidden units,
then we found that these receptive fields were "deformed" by the
canstraint of having to categorise as well as autoassociate (i.e.,
preserve the sensory projection and categorise it as well). Viewed as
trajectories in the unit cube of hidden-unit activations, the
hidden-unit representations of each of the lines "migrated" until they
reached locations where the cube could be partitioned into the three
categories (short, medium, long), much as in the human multidimensional
scaling analysis.
<p>
This is a trivial result for a gradient-descent network, but it is an
interesting potential explanation for CP effects in human beings:
Perhaps internal representations are being subtly deformed in the
service of categorisation. Could this also be the locus of the
nonarbitrary shape-dependency in a hybrid nonsymbolic/symbolic system?
We are now testing the generality of the CP effect in other kinds of
Nets and trying to implement it in hybrid systems.
<p>
REFERENCES
<p>
Goldstone, Robert (1994) Influences of categorization on perceptual
discrimination. Journal of Experimental Psychology: General, 1994 Jun,
v123 (n2):178-200.
<p>
Harnad, S. (ed.) (1987) Categorical Perception: The Groundwork of
Cognition. New York: Cambridge University Press.
<p>
Harnad, S. (1990a) The Symbol Grounding Problem.
Physica D 42: 335-346.  [Reprinted in Hungarian Translation as "A
Szimbolum-Lehorgonyzas Problemaja." Magyar Pszichologiai Szemle
XLVIII-XLIX (32-33) 5-6: 365-383.]
<p>
Harnad, S. (1990b) Symbols and Nets: Cooperation vs. Competition.
Review of: S. Pinker and J. Mehler (Eds.) (1988) 
Connections and Symbols Connection Science 2: 257-260.
<p>
Harnad, S. (1992) Connecting Object to Symbol in Modeling
Cognition.  In: A. Clarke and  R. Lutz (Eds) Connectionism in Context
Springer Verlag.
<p>
Harnad, S. (1993) Grounding Symbols in the Analog World with Neural
Nets. Think 2(1) 12 - 78 (Special issue on "Connectionism versus
Symbolism," D.M.W. Powers & P.A. Flach, eds.). [Also reprinted in
French translation as: "L'Ancrage des Symboles dans le Monde Analogique
a l'aide de Reseaux Neuronaux: un Modele Hybride." In: Rialle V. et
Payette D. (Eds) La Modelisation. LEKTON, Vol IV, No 2.]
<p>
Harnad, S. (1993) Symbol Grounding is an Empirical Problem: Neural Nets
are Just a Candidate Component. Proceedings of the Fifteenth Annual
Meeting of the Cognitive Science Society. NJ: Erlbaum
<p>
Harnad, S. (1994) Computation Is Just Interpretable Symbol
Manipulation: Cognition Isn't. Special Issue on "What Is Computation"
Minds and Machines 4:379-390
<p>
Harnad, S. (1995) Grounding Symbolic Capacity in Robotic Capacity.
In: Steels, L. and R. Brooks (eds.) The "artificial life" route to
"artificial intelligence."  Building Situated Embodied Agents. New
Haven: Lawrence Erlbaum
<p>
Harnad, S., Hanson, S.J. & Lubin, J. (1991) Categorical Perception and
the Evolution of Supervised Learning in Neural Nets. In:  Working
Papers of the AAAI Spring Symposium on Machine Learning of Natural
Language and Ontology (DW Powers & L Reeker, Eds.) pp. 65-74. Presented
at Symposium on Symbol Grounding: Problems and Practice, Stanford
University, March 1991; also reprinted as Document D91-09, Deutsches
Forschungszentrum fur Kuenstliche Intelligenz GmbH Kaiserslautern FRG.
<p>
Harnad, S. Hanson, S.J. & Lubin, J. (1995) Learned Categorical
Perception in Neural Nets: Implications for Symbol Grounding.
In: V. Honavar & L. Uhr (eds) Symbol Processors and Connectionist
Network Models in Artificial Intelligence and Cognitive Modelling:
Steps Toward Principled Integration.  pp. 191-206. Acadamic Press.
<p>
