Harnad, S. (1987) Psychophysical and cognitive aspects of categorical perception: A critical overview. Chapter 1 of: Harnad, S. (ed.) (1987) Categorical Perception: The Groundwork of Cognition. New York: Cambridge University Press.

Psychophysical and cognitive aspects of categorical perception:
A critical overview.

Stevan Harnad
Department of Psychology
University of Southampton
Highfield, Southampton SO17 1BJ
UNITED KINGDOM

1.1 The categorization problem

1.1.1 Entry Points. One of the most basic questions of cognitive science is: How do organisms sort the objects of the world into categories? The problem is very general, for an object can be any recurring class of experience, from a concrete entity such as a cat or a table to an abstract idea such as goodness or truth. And sorting can be any differential response to the object category, from detecting and instrumentally manipulating it to identifying and verbally describing it. Categorization hence plays a critical role in perception, thinking and language and is probably a significant factor in motor performance too.

There are many entry points into the problem of categorization. Two particularly important ones are the so-called top-down and bottom-up approaches. Top-down approaches such as artificial intelligence begin with the symbolic names and descriptions for some categories already given; computer programs are written to manipulate the symbols. Cognitive modeling involves the further assumption that such symbol-interactions resemble the way our brains do categorization. An explicit expectation of the top-down approach is that it will eventually join with the bottom-up approach, which tries to model how the hardware of the brain works: sensory systems, motor systems and neural activity in general. The assumption is that the symbolic cognitive functions will be implemented in brain function and linked to the sense organs and the organs of movement in roughly the way a program is implemented in a computer, with its links to peripheral devices such as transducers and effectors.

Another entry point is human performance modeling. What people categorize, and how, is studied experimentally; then models are proposed to account for people's performance, especially their efficiency and their errors. These models try to capture why it is that we can categorize some things more easily or quickly or reliably than others. A related approach is the study of cognitive development: how the child acquires categories. These lines of inquiry have their counterparts in comparative psychology, where the behavior of animals is studied to determine how they categorize their worlds. The comparative approach is also concerned with the evolutionary origins and the adaptive value of categories.

One last approach is less easy to classify: the psychophysical one. Psychophysics is concerned with the relationship between physical stimulation and sensation, for example, between the physical intensity of a stimulus and the psychological intensity of the sensation it causes. Categorization is studied by examining the limits of discrimination (how small a physical difference we can tell apart) and of identification (what classes of stimuli we can reliably label). The approach is bottom-up, though not in the usual hardware-to-software sense, but rather in the sense the name implies: physical-to-psychological. Psychophysics is close to the level of neural mechanisms and is often pursued in close collaboration with parallel work on sensory psychophysiology, although it is methodologically independent of such work.

1.1.2 Learning and representing categories. A theme that runs through the various approaches to categorization is that of learning. Clearly, no organism is born a blank slate. Some categories are innate. The comparative and psychophysical approaches tend to be concerned primarily with the mechanisms of innate categorization, although learning factors are not necessarily excluded. The developmental and neural approaches are concerned with both innate and acquired categories, whereas the top-down and human-performance research concentrates mainly on learned categories. The general problem of the origins of categories -- both phylogenetic and ontogenetic -- is something that looms large for all the approaches to categorization: This is the problem of induction: How does any mechanism learn to form, from a finite sample of particular cases, a reliable generalization about future cases?

A second theme that runs through all the approaches to categorization is that of representation: How are categories represented? What structures and processes make it possible to categorize appropriately? Top-down research tends to favor symbolic representations, whereas bottom-up work focuses on sensory representations. All approaches are concerned with the question of which representations are innate and which are learned. The problem of the origins of representations -- in evolution, development and learning -- hence coalesces with the problem of the origins of categories.

1.1.3 The psychophysics of categorical perception. All of these approaches to categorization will be examined to some degree in this chapter, but the focus will be on a psychophysical phenomenon that may help to unify the diverse lines of inquiry. The phenomenon is categorical perception (henceforth CP). CP was first observed with color perception and the perception of speech-sounds, but it has since been found in a variety of domains. The effect is best described as a qualitative difference in how similar things look depending on whether or not they are co-classified in the same category. The experimental paradigm for demonstrating CP is psychophysical: discrimination and identification performance (telling things apart and labeling them) is compared for a set of stimuli. Usually the stimuli vary along a physical continuum and regions of that continuum have been or can be assigned labels.\**

[footnote start] The question of the origins of the labels touches on the problem of innateness and learning and is ordinarily not addressed directly by the psychophysical approach. [footnote end]

An example of CP is the color spectrum as it is subdivided into color categories, or an acoustic continuum called the second-formant transition as it is subdivided into the stop-consonant categories /ba/, /da/ and /ga/. In both cases, equal-sized physical differences between stimuli are perceived as larger or smaller depending on whether the stimuli are in the same category or different ones. Indeed, the effect is not only quantitative but qualitative: A pair of greens of different shades look more like one another than like a shade of yellow (which may be no more different in wave length from one of the greens than the other green is), and this difference is one of quality. The same is true of /ba/'s and /da/'s.

Qualitative differences in perception cannot be demonstrated objectively; we have only the subject's word (and our own introspective experience) as evidence for them. Quantitative differences, however, can be tested experimentally. The method is to compare discrimination and identification performance. Discrimination requires a subject to tell apart stimuli presented in pairs (by indicating whether they are the same or different). Identification requires the subject to categorize individual stimuli using labels (to say, for example, whether a stimulus is a /da/ or /ga/). A CP effect occurs when (1) a set of stimuli ranging along a physical continuum is given one label on one side of a category boundary and another label on the other side and (2) the subject can discriminate smaller physical differences between pairs of stimuli that straddle that boundary than between pairs that are entirely within one category or the other. In other words, in CP there is a quantitative discontinuity in discriminability at the category boundaries of a physical continuum, as measured by a peak in discriminative acuity at the transition region for the identification of members of adjacent categories.

1.1.4 Categorical perception as the groundwork for category cognition.

The unifying hypothesis of this chapter is that this highly specific psychophysical phenomenon may be related in an important way to the general problem of categorization: that CP may not only provide the building blocks -- the elementary units -- for higher-order categories, but it may also be a representative model for the categorization process in general. The provisional may is used because, as mentioned above, there are many questions still remaining to be answered about CP: How are CP categories formed? What is the role of innate mechanisms? What is the role of learning? What is the nature of the underlying representations? There are even questions about how general CP is as a psychophysical phenomenon. However, the burden of this review of what may appear to be diverse and disparate lines of investigation, and the purpose of the theoretical synthesis that follows, is to show that the answers to the foregoing questions, as they are currently emerging from the many different areas where the pertinent research is being conducted, appear to be promising, and that there may now be a basis for a mutually informed, unified research strategy.

The objective of this review is therefore to raise the basic questions about categorization in general in the context of CP in particular, to survey the currently available answers, and to focus future investigation of categorization on CP as a unifying model. I accordingly begin with a survey of the pertinent cross-disciplinary lines of research. This review will generate more questions that answers. An attempt will be made to propose some provisional, testable answers in the theoretical synthesis that follows in Chapter 18.

1.2 Psychophysical Foundations of Categorical Perception

1.2.1 Analog/Digital Conversion. The CP phenomenon can be seen as an analog-to-digital transformation that recodes a continuous region of physical variation into a discrete, labeled equivalence class. The AM/PM indicator on a digital watch illustrates the main features of this transformation: Time varies continuously, but in one region (from midnight to noon) the watch labels it all AM and in the other (noon to midnight), PM. In this case, noon would be the category boundary (if we ignore the 24-hour cyclicity). The analogy also holds at a finer-grained level, for exactly the same digitization is going on at the watch's minutest scale of resolution, say, seconds. Here too the continuum 11:59 - 12:01 is being treated as two discrete chunks, with the boundary at 12:00. And the analogy goes still further, for although the nearest second is as fine a category as the watch can identify, its internal analog pacemaker is presumably making still finer discriminations, which could in principle be expressed by a signal indicating whether one event had occurred before or after another. Even this comparator, however, would have limits on its resolving capacity, so that sufficiently tiny time differences would simply not be discriminable by the watch.

There is a CP counterpart for most of these features. The category boundary can be viewed as a threshold for identifying the nearest second. If the device were being used as an automatic stopwatch, one second would be the threshold amount of time that had to elapse to determine whether an event was to be labeled as occurring within the foregoing second or the following one. If two events were being timed, however, the smallest difference that the device could discriminate (as before/after or same/different) would be smaller than the smallest difference that it could label with a specific time. Moreover, the device would make fewer errors with tiny differences that straddled the boundary between a pair of adjacent intervals (i.e., if one event came just before the boundary and one just after) than with tiny differences that occurred in the middle of an interval, because near the boundary the accuracy of the analog difference comparator could be augmented by the accuracy of the digital threshold comparator.

1.2.2 The boundary effect. Pastore (1987) proposes that this boundary effect (i.e., heightened discriminability between categories and reduced discriminability within categories, rather than 0% within-category discriminability or 100% identification accuracy) is the hallmark of CP, which he interprets as a threshold phenomenon. In perception, thresholds tend to delineate all-or-none, qualitative differences as opposed to graded, quantitative ones (Pastore et al. 1984). The threshold for flicker-fusion (the flicker-rate above which a light is perceived as being on continuously) is an example of such a qualitative change, and Pastore shows that flicker-fusion displays all the requisite features of CP. Even an external reference point -- such as the one involved in the task of visually discriminating and identifying v's and Y's, differing along a continuum of variable stem-lengths -- will mimic all the features of the CP phenomenon.

Pastore shows that when subjects are asked to discriminate pairs of complex stimuli (e.g., tones, one fixed and one varying in intensity or frequency), CP boundary effects can arise from higher-order interactions (both acoustic and neural) among the dimensions on which the stimuli vary. He also argues that some complex acoustic CP effects thought to be unique to speech may arise from higher-order effects of the threshold for perceiving which of two sounds occurs first. Similarly, trading relations (see also section 2.3.2 on the motor theory of speech perception, below), in which the location of CP boundaries seems to be influenced by how a combination of sounds would have had to be pronounced, may be an effect of intensity/duration tradeoffs with complex stimuli.

Pastore uses this framework to interpret the many lines of investigation of CP, from their origins in phenomena assumed to be peculiar to speech perception and production, to the much more general view of CP that seems to be emerging currently. He suggests that regions of natural sensitivity underlie the CP boundary effect and that exposure, practice and selective attention may influence its location, as well as the acuity of discrimination and the accuracy of identification. Other questions Pastore's account calls to mind will echo throughout this chapter: What are the functions of categorical and continuous perception? Can category boundaries arise as a result of learning alone, and if so, how? What are the representations and processes underlying categorization, both psychophysical and cognitive? And what is the relation between CP and higher-order categorization and language?

1.2.3 Iconic traces and the context of confusable alternatives. The standard CP experiment tests how well subjects can identify individual stimuli that vary along a continuum and how well they can discriminate pairs of these stimuli as being the same or different. Macmillan (1987; Macmillan et al. 1987) has proposed some refinements on both the experimental method and the analysis of the results that may lead to a more perspicuous and general interpretation of the processes involved. He distinguishes between fixed discrimination, in which subjects are tested repeatedly with the same pair of stimuli in a block of trials, and roving discrimination, in which the pairs may come from anywhere in the range of the continuum being tested. He also recommends using signal detection analysis (which yields a standardized measure of detectability that is independent of various response biases) instead of the percent-correct scores that are ordinarily used that are ordinarily used in measuring discrimination and identification in CP experiments.

With this refined methodology, CP data can be interpreted in terms of Durlach & Braida's signal-detection-theoretic model for intensity perception (1969). In accounting for the variance in discrimination and identification performance, these investigators have singled out two interesting parameters (a third parameter, common to all stimuli, is not relevant to CP): The trace parameter is interpreted as a processing mode that compares a stimulus with the sensory trace of another stimulus; it is influenced by how long the delay between the two stimuli is (presumably because of the decay of the iconic trace in immediate memory) and it contributes to both fixed and roving discrimination. The context parameter is interpreted as a processing mode that compares the stimulus with the overall stimulus context with which the stimulus could be confused (including possible anchor features); it is influenced by how large the stimulus range is and it contributes to roving discrimination and to identification (presumably involving short- and long-term memory effects, respectively).\**

[footnote start] These two processing modes are further elaborated in the three-level representational model presented in Chapter 18. The trace mode corresponds to comparing iconic representations and the context mode correponds to detecting the features that will reliably categorize an input relative to a context of confusable alternatives. [footnote end]

The conclusion from Macmillan's more general signal detection analysis is that the continuous/categorical distinction in CP may be an oversimplification, and that continua may be better described as differing in how much they draw on the trace and context modes and in where and what their anchors are. Anchors may be the extremes of continua (as in the loud and soft end of a range of loudnesses) or they may occur in the middle of the range, in which case they may be boundary regions of heightened sensitivity (for discrimination) or central prototypes (for identification).

Macmillan's analysis makes it clear that many questions about CP can be instructively reformulated as questions about the nature, origin and functional role of anchors. What are anchors (apart from the special case of edges of continua)? Are some innate and some learned? Of those that are learned, are some anchors short-term context effects (as in roving discrimination and adaptation effects) and some long-term, overlearned effects (as in object-naming)? What about multidimensional stimuli? In general, what are the underlying representations of stimuli and of stimulus-categories (and the processing modes operating on them) that give rise to different trace, context and anchor effects in different continua?

1.3 Categorical Perception of Speech

1.3.1 The Motor Theory of Speech Perception. Of the two phenomena that originally stimulated the special interest in CP, color boundaries (see the section on color categories and the section on color-evoked potentials, below) and phoneme boundaries, it was the latter that came to be far more intensively studied. The reason for this was that until the discovery of the physiological bases for color vision (Boynton 1979, DeValois & Devalois 1975) the only theory of color CP was the Whorf Hypothesis (Whorf 1964), according to which the location of color boundaries is determined by where languages happen to put them. This was almost too nonspecific a hypothesis to generate focused research (and once it did, it turned out that color boundaries were largely determined by species-specific color receptors rather than by language; Berlin & Kay 1969). Phoneme boundaries, on the other hand, were explained by the motor theory of speech perception, a more specific and testable theory (Liberman, Harris, Hoffman & Griffith 1957).

According to the motor theory, speech perception is special, and different from other perceptual domains, auditory and nonauditory, in that how it is heard is influenced by how it is produced. The discontinuities among /pa/, /ta/, /ka/, /ba/, /da/ and /ga/, for example, arise from the discontinuities required to pronounce them. Although there is a continuum in a single acoustic dimension varying from, say, /pa/ to /ka/ (the second formant transition continuum), there is a discontinuity between the movement of the two lips required to produce /pa/ and the tongue-to-palate movement for /ka/. Hence, just as Helmholtz had hypothesized that the basis for many perceptual constancies (see Eimas et al. 1987) is unconscious inference (e.g., we perceive that an object remains the same size even though the size of its image on our retina is shrinking, because we unconsciously infer that as the image gets smaller the object is moving farther away), Liberman and his colleagues hypothesized that we perceive what sound we are hearing by unconsciously inferring how it would have had to be pronounced.

The motor theory generated a large body of valuable research on how speech was special (Liberman 1976, 1982), and how our hearing system may have a specialized module for processing speech sounds. Among the possible sounds that humans can produce, a certain finite subset of them has been used by the existing languages. This is our phonetic repertoire. Any given language, however, uses only a still smaller subset of these, consisting only of those sounds -- called phonemes -- that signal differences in meaning, plus whatever variation they undergo because of the other sounds they are pronounced together with (Chomsky & Halle 1968): For example, the o in ton and the u in turn are not distinct phonemes in American English; they are positional covariants (or allophones) of the same phoneme, the pronunciation of which depends on whether or not it precedes an r; allophones never function to differentiate two different meanings. (This particular pair of positional covariants does not occur in, for example, Irish English.) The o in ton and the a in tan, on the other hand, are distinct phonemes (in both Irish and American pronunciation).

According to the motor theory, these minimal meaning-signalling units called phonemes are processed by special phonetic mechanisms of hearing. Specialized for speech perception, these mechanisms analyze the distinctive features of phonemes by what it would take to synthesize (i.e., pronounce) them. Using an internal model of the vocal apparatus of production (the mouth, tongue, larynx, etc.) and what it can and cannot do, such an analysis-by-synthesis mechanism (Stevens & Halle 1967) is even able to make allowances for the positional variants of phonemes as a function of what sounds they are co-pronounced with, how quickly, even by how high a voice. Some of the constraints on this mechanism are phonetic, or universal to all human languages, and some are phonemic, depending on what the meaning-signalling units in a particular language happen to be. CP boundaries would be special cases of the functioning of this mechanism corresponding to perceived discontinuities between those sounds that would be separated by discontinuities in their production.

The original motor theory assumed that the internal language-production model had to be learned. But later evidence that preverbal infants (section 2.3.4) and nonverbal animals (sections 2.5.2 - 2.5.5) have many of the same CP boundaries that mature language speakers do suggested that some of functions of the specialized speech perception mechanism must have been biologically prepared by evolution rather than being learned from experience. Evidence for auditory CP with nonspeech stimuli (2.3.3) has cast some further doubt on whether CP boundaries are mediated by a speech-production model. However, there remain some effects suggesting that speech CP may still be special among other varieties of CP in virtue of its specific links to production.

1.3.2 Continuing evidence for special motor effects in phoneme CP. The evidence for the existence of speech CP effects that still seem to be best explained by a mechanism which infers how the sounds would have had to be pronounced is reviewed by Repp (1984; Repp & Liberman 1987). These effects include changes in a phoneme's CP boundary depending on (1) the sounds that came immediately before or after it (or even its more global context); (2) differences in the effects of adaptation to a frequently presented sound depending on what the listener's native language is; (3) trading relations, or the equivalence of different combinations of cues in producing the same CP effect (where the combinations are explicable by how they would have had to be co-pronounced); (4) overall speaking rate; (5) other overall speaker characteristics (whether the voice is high or low, or even more specific individual speaking traits); (6) effects of expectations arising from grammar and meaning (e.g., shifts in the boundary between s and sh depending on whether or not one is expecting to hear a plural ending); (7) differences between speakers of different languages (as in the absence of a CP boundary between r and l for Japanese speakers).

All these findings continue to suggest that speech CP is special. In the context of the theoretical synthesis I am attempting here, however, the critical questions are: What aspects of speech CP can be generalized? Are there shared properties of CP in all domains where there is an analog relation between perception and production? (Other cases would include sign language, facial expression, lip-reading, music, certain motor skills and certain animal signalling systems.) What about the role of learning? Is it mere parameter-setting and fine-tuning (modulating prepared boundaries that already exist innately) or can CP boundaries be built up entirely from learning? Can the analysis-by-synthesis model be generalized to a domain where there is no perception/production analog? Finally, are there any inferences to be made from the functional role of bounded phonemes in speech perception to the role of bounded categories in general, in perception and cognition? These questions will be taken up again later in this chapter and chapter 18; a sizeable portion of the balance of this review will be devoted to the available evidence from speech perception, however, because phoneme CP is by far the best studied of CP phenomena.

1.3.3 The label-learning hypothesis of CP. There are three contending kinds of theories to account for speech CP:

(1) As just described, the motor theory attributes the discontinuities in speech perception to mediation by discontinuities in speech production. (A Gibsonian variant of this theory [Shankweiler et al 1977] turns it on its head, claiming that invariant cues in the sounds themselves signal the discontinuities and hence how the mouth produced them.)

(2) The hypothesis of innate sensitivity (Howell & Rosen 1984) attributes the discontinuities to inborn enhancement and reduction of the sensory system's sensitivity in selected portions of certain physical continua (i.e., violations of Weber's law that perceived differences should be proportional to relative physical differences). Sometimes such boundary effects are even generated by higher-order interactions of components of the physical signal itself (i.e., some complex physical continua may not be strictly continuous, as Pastore [1987] also suggests.)

(3) The label-learning hypothesis has several variants (Lawrence 1950; Bruner et al. 1956; Lane 1965; E. J. Gibson 1969). The weakest is that labels are learned by mere exposure and association, and that label-differences then come to mediate identification and to influence discrimination in much the way the motor theory claims motor discontinuities do. A stronger version is that label-learning, through selective attentional effects and learned expectations, actually alters the encoded similarity structure of stimuli, making those with the same label look more alike. This acquired similarity view can be contrasted with a still stronger acquired distinctiveness view, whereby stimuli with different labels come to look more different. According to the strongest (Whorf-like) version, label-learning produces not just quantitative but qualitative effects.

Hypotheses (1) - (3) have tended to vie for exclusive sovereignty as explanations of speech CP. Rosen & Howell (1987), however, in a masterly review of the evidence for all sides, show that none of the theories is able by itself to account for all the findings and that it is more realistic to see the three, not as competitors, but as each making an independent contribution to the CP phenomenon in the special case of speech.

There is strong evidence -- for example, with the /ba/, /da/, /ga/ (place-of-articulation) discontinuities -- that motor mediation plays a role in phoneme boundary effects. But the absence of CP along the sh/ch (frication) continuum, for example, makes the motor explanation of the p/b (voicing) boundary seem ad hoc. Animal and infant CP-boundary findings support the innate-sensitivity hypothesis, but heightened discrimination by language-speakers and boundary-location differences between speakers of different languages implicate learning. Learning also seems to be the explanation for why musicians show stronger CP for semitone boundaries than nonmusicians (Siegel & Siegel 1977); but innate sensitivities seem to be playing a role in this effect too, as well as in unlearned CP for certain buzz, noise, and relative-timing continua (Howell & Rosen 1984).

Rosen & Howell favor a two-process CP model of the kind proposed by Macmillan (1987; Macmillan et al. 1987) and based on Durlach & Braida (1969). One process again involves a rapidly decaying echoic or analog representation of the stimulus and plays a role in immediate comparisons; the factors of recency and innate sensitivity would have their effects here. The other process involves a longer-term, context-dependent categorical representation influenced by factors such as the range and relative frequency of the stimuli (i.e., their interconfusability); this would be the locus of the speech-specific effects of motor analysis/synthesis as well as of label-learning factors.\**

[footnote start] These same two kinds of representation are incorporated in the three-level representational model that will be presented in Chapter 18. [footnote end]

In projecting conclusions from the special case of speech CP to CP in general, it is clear that the motor factor will be the least useful (except in production analogue media such as pitch perception). Innate sensitivities clearly do play a role in other modalities such as vision (see the section on color categories). But the most general and potentially interesting factor is that of label learning, for not only is the process underlying label-learning critical to all domains of categorization, but the subsequent use of labels (and their underlying representations) extends into the most general problems of language and cognition.

1.3.4 Phoneme CP and innate `prototypes' in preverbal infants. A natural question to ask about CP in adults is: When does it arise? Is it learned or inborn? If learned, how early is it learned? And if inborn, how early is it manifested? Eimas and his coworkers have provided some of the answers for the special case of speech CP and they have critically analyzed the evidence in other modalities.

Eimas et al. (1971; Jusczyk 1984) trained preverbal infants to perform an operant response (sucking a non-nutritive device) in order to hear sounds. When a sound is new, infants respond vigorously. When they get used to it after repeated presentations, their response gets weak. The measure of discrimination is how much a new sound strengthens the response (on the assumption that the strength of the response reflects how discriminable the new sound is from the familiar one).

With this measure Eimas found that four-month-old infants, never having spoken a word, not only have CP boundaries for speech sounds, but even show sensitivity to some of the modulations caused when certain sounds are pronounced together (as discussed in section 2.3.1). They concluded that infants are born with categorical representations for speech sounds, perhaps encoded in syllabic chunks. These take the form of prototypes that facilitate language-learning and can be updated and fine-tuned on the basis of experience with specific languages. This modification of prototypes by experience may either take the form of an irreversible imprinting effect or a reversible selective-attentional effect.

Eimas et al. (1987) review parallel evidence for innate categorical representations in other modalities. Two CP-like phenomena are perceptual constancy (the tendency to see objects as looking the same despite variations in distance, size and orientation, or to hear speech as sounding the same despite variations in speaker, rate and articulatory context) and equivalence classes (the tendency to generalize over certain stimulus variations, responding to them as being equivalent). Using operant discriminative responses, other investigators have found CP-like effects in the way infants perceive pattern-orientation, faces and other objects and properties (Kuhl 1986; and see section 2.5.4).

Eimas et al. propose a general schema for the formation and revision of categorical representations: innate prototypes, with their parameters fine-tuned by learning. What is unclear is just what a prototype really is -- other than whatever must be encoded in the brain in order to generate CP. The most prevalent notion of a prototype is that it is some sort of characteristic or ideal category member, one that other members resemble to varying degrees. This is intended to contrast with a rival kind of representation: a set of distinctive features necessary and sufficient to determine category membership (sections 2.7.1 and 3.5.1). At the level of the one-dimensional parameters involved in most sensory CP, however, the prototype/feature distinction would seem to collapse. The substantive questions appear to concern the nature of the information represented, how much of it is innate in the case of speech and form perception, how it is modified by experience, and how speech and form CP are related to higher-level categorization (e.g., classification and abstraction).

1.4 Models for Speech CP

1.4.1 Feature Detection. The theories of CP in the special case of speech sounds are at some level all feature-detection models (as the theories of higher-order categorization will also prove to be) even though they may, like prototype models and higher-order knowledge models, be formulated as the rivals of some specific type of feature-detection theory. This is because categorization is concerned with picking out classes of objects in the world, and if this can be accomplished by a categorizer at all (and not by trivial rote enumeration or by magic), then its success must be the result of having picked out some reliable basis for the categorization -- something that makes it possible to sort the members from the nonmembers. Although it may not be simple or direct, may require much computation to find and act on, and may involve a complex set of conditional or either/or properties, in the end this process must nevertheless amount to detecting the features of the objects that provide a reliable basis for the categorization. To deny this amounts either to denying that a reliable basis for the categorization exists in the objects or to deny that the categorizer finds or uses it, which is in both cases to deny that the categorization is possible at all.\**

[footnote start] The related Gibsonian concepts of invariants and affordances are discussed in a Neisser's (1987) volume devoted to higher-order categories as well as in Chapter 18, section 18.5.1. [footnote end]

1.4.2 Inborn neural feature-detectors. One special class of feature-detection models in speech-perception research is the neural feature-detectors, which are based on an analogy with feature detectors in vision arising from the work of Hubel & Wiesel (1965). Visual feature detectors seem to be organized in an orderly network of dot, line and edge-detectors arranged promisingly in a rising hierarchy of abstraction. This picture has been carried over into speech-perception research where it was hoped that the substrates for either the phonetic features peculiar to speech or else some general auditory features, or both, would turn out to have specialized neural detectors (see Cooper 1979; Diehl 1981). The experimental method used was selective adaptation, on the assumption that if the detector for one of a pair of opponent processes analogous to those in, say, color perception (De Valois & De Valois 1975), were selectively fatigued by repeated stimulation, the CP boundaries it governed would accordingly shift in favor of its opponent detector.

Remez (1987) likens this view to Selfridge's (1959) pandemonium model of pattern analysis, in which low level image demons detect physical attributes of the stimulation, middle level computational demons detect features peculiar to certain objects, and high level cognitive demons detect objects. The model is called pandemonium because the signalling function of these demons is analogous to yelling, with the demons who yell the loudest calling the shots, so to speak.

At first the empirical results were promising, but then they became more complicated and inconsistent with the phonetic and auditory theories that had indicated what features to look for. Eventually, Remez indicates in his review, something as complex as a homunculus would have had to be performing the functions of the detectors, which now even called for sensitivity to the meaning of the stimulus. Hence feature detectors were abandoned in speech-perception research.

The selective adaptation method may indeed not be a useful way of testing feature-detection theories, because the complex, higher-order correlations and interactions involved in speech perception may not correspond one-to-one with the activity of any fixed set of neurons (Remez & Rubin 1984). Moreover, there may not exist a special neural module devoted specifically to speech perception. Indeed, even the sense-modalities themselves (hearing, vision, etc.) may not be functionally independent of one another or of higher-order cognition, either in speech perception or in other complex categorical activities. And there is little doubt that a pandemonium-like yelling model (which is to say, a summation model) is far too simple for sophisticated categorization. -- But whatever the right model is, it will still have to do feature detection (and the feature detection will have to have a neural substrate).

1.4.3 Top-down symbolic decision mechanisms. Based on a critical reappraisal of feature detectors in speech perception (Diehl 1981), Diehl & Kluender (1987) conclude that tacit knowledge plays a large role in speech recognition. The cues that guide phonetic categorization are not of the local, isolable kind that would have to exist if there were to be specific feature-detectors for them. Speech perception is sensitive to holistic and global context effects. To account for this, Diehl & Kluender suggest that there is relatively little analysis of the raw signal in the sensory representation (the neural spectrogram). Instead, the hard work is done at a decision-making stage on the basis of tacit knowledge about the physical and physiological constraints on speech production.

Diehl & Kluender stress that theirs is not a motor theory dependent on the mediation of speech recognition by a production template but an argument for a knowledge-driven, top-down computational theory. They consider two actual speech recognition models: LAFS (Klatt 1980), a bottom-up feature-detecting model is rejected as having too many primitive features and not capturing their higher-order regularities. The earlier model, HEARSAY (Reddy 1980), is preferred because it is more computational.

Assigning a dominant role to knowledge and decision processes amounts to preferring computations on symbolic descriptions of sensory data rather than direct sensory analysis of the sensory data themselves. The questions it leaves unanswered are (1) What are the symbolic descriptions on which knowledge operates? (2) How does the speech recognizer get from the sensory data to the symbolic descriptions? (3) What is the knowledge that is applied to these descriptions? And (4) where does that knowledge come from? The detection of the regularities that allow speech to be categorized appropriately amounts formally to feature detection whether or not it is done formally. If there are in fact no innate or learned sensory feature detectors then the model must know (tacitly?) what they would have had to do, had they existed.

1.4.4 Continuous `fuzzy-logical' decision models. As several CP investigators have pointed out, in the initial uncertainty about what was and was not special about CP, too much was made of the congruence between discrimination and identification; some investigators had even assumed that what made perception categorical was the impossibility of making any within-category discriminations. A little reflection shows that this cannot be true, for we certainly can tell apart members of the same CP category (although we may not be able to identify them).

Massaro (1987, 1988) suggests that the failure to find within-category indiscriminability argues for abandoning the categorical/continuous distinction altogether. All sensory processes are continuous, and CP boundary effects arise only because of discrete decision processes. Such all-or-none decision processes may be engaged by tasks that make memory demands exceeding the capacity of the rapidly decaying sensory trace, forcing the subject to recode the stimulus by giving it a verbal label. Another source of discontinuity would be an external reference stimulus (as described by Pastore 1987) or extending the range of a stimulus continuum across a singularity such as a zero-crossing (Marr 1982).

Massaro calls this decision-governed discontinuity categorical partitioning instead of categorical perception and offers two kinds of evidence for the fact that the underlying sensory representation is continuous: goodness judgments (e.g., how good a /ba/ is this?) and identification reaction times. Both of these do indeed vary continuously, rather than in the all-or-none fashion characteristic of the identification boundary. Massaro proposes a fuzzy-logical model for the underlying continuous aspects of perception, based on graded degrees of category membership (Zadeh 1965).

Massaro's findings and conclusions are consistent with the fact that discrimination performance is not all-or-none. They do not seem, however, to account for the anisotropy of the discrimination continuum, with its compression of equal-sized physical differences within categories and amplification between. Moreover, as in the case of the work in the prototype tradition (section 2.7.1ff), the all-or-none character of identification and labeling in CP does not seem to be explained by attributing it to a partitioning by a decision process; for determining the nature of that decision process, how it originates and how it operates on the sensory data, is in fact the problem of CP.

1.5 CP in Other Modalities and Species

1.5.1 Color CP, the Whorf Hypothesis and innate category boundaries. The phenomenon of CP is by no means restricted to speech or to human beings. Although its investigation in other modalities and other species is only beginning, this work is already helping to place CP into clearer functional and ecological perspective. Bornstein (1984, 1987) addresses the issue of whether CP effects are universal or language-relative. According to the relativist (Whorf 1964) hypothesis, language determines how we view reality. The location of CP boundary effects is determined by how we elect to carve the world into nameable parts. The alternative, universalist view is that category boundaries tend to occur where nature has put them, either because there are discontinuities in the world or because our nervous system innately imposes discontinuities (Berlin & Kay 1969).

Bornstein examines how the evidence about color categories bears on the two hypotheses. On the basis of color discrimination performance by animals and infants as well as by adults in different cultures (with different languages and different named subdivisions of the color spectrum), together with the data on physiological mechanisms of color vision, Bornstein concludes that there is overwhelming support for the universalist hypothesis: Color categories depend on innate detectors, and language and experience can only influence the fine-tuning or short-term modifications of color boundaries (De Valois & De Valois 1975; Boynton 1979). To ascertain whether color CP is somehow unique or special, Bornstein goes on to make systematic comparisons between color CP and phoneme CP, showing that the pattern of data is very similar in both cases (and recommending where further parallels might be looked at).

Bornstein's findings are important in that they increase our understanding of the special innate mechanisms of color CP and help place phoneme CP into context as less of a special case than it was thought to be. However, color perception and speech perception may still turn out to be special in that they are both biologically prepared phenomena. The real test of the scope and generality of CP is the case of learning categories for which there is no specially prepared discontinuity, either in the nervous system or in the environment. Can a CP boundary arise in our perception of initially confusable objects purely on the basis of experience with sorting them into different labeled categories?

1.5.2 Biologically prepared CP in animal communication. Ethologists are interested in what Lorenz (1981) called releaser or key-stimuli, which trigger specific behaviors, often biologically prepared ones, sometimes learned ones. One prominent class of key-stimuli is the communicative signals of animals. Specific calls signal differences between species, between individuals and between meanings (e.g., courtship versus threat).

Ehret (1987) reviews the evidence that many of these calls are perceived categorically. The strict way to demonstrate this is of course with discrimination and identification tests for the existence of a CP boundary that mediates identification and modulates discrimination within and between categories. Although much of the ethological evidence is incomplete -- consisting only of identification data or between- but not within-category discrimination data -- CP has been unequivocally confirmed in some cases (Ehret & Merzenich 1985), and the probability of similar findings is high in many of the analogous incomplete cases.

Ehret emphasizes that key-stimuli are typically multidimensional and occur in a particular ecological context. Whereas human CP research tends to focus on the variation of a single unidimensional stimulus parameter at a time, ethological CP research is concerned with variation of the whole signal and with its ecological signalling function. Ehret hypothesizes that the adaptive function of CP is to reliably differentiate discrete species-specific, individual-specific and call-type-specific parameters within noisy and variable multidimensional signals that also vary in continuous motivational parameters. He proposes a specific selective-attentional model for CP, involving the detection of critical durations and bandwidths by means of innate (or learned) templates.

The ethological aspect of CP is important and promising, but some questions arise concerning the commensurability of human and animal CP. For example, it is not clear why categorical perception should be needed in animal communication at all, as long as calls can be kept sufficiently discrete by categorical production . If natural call types are suitably discrete, then the interpolation of continuous parameters that is involved in testing discrimination would be, in a sense, unecological (in much the same way that generalization gradients are unecological). This may also be why CP has so far been confirmed for species and call-type discrimination, but not for the discrimination or identification of individual animals: Differences among the calls of individuals may be continuous rather than discrete. Another question concerns the role of biological preparation in CP: Do CP boundaries occur only where there is a natural discontinuity, internal or external, or can they occur anywhere in a continuum? This of course leads again to the recurrent and central question about the role of learning in CP. Finally, how representative and critical is the factor of communication and the existence of production/perception homologies? Is there CP in other psychophysical continua as well? In any case, both the continuities and the discontinuities between human and nonhuman CP should help clarify the origins and functions of the phenomenon.

1.5.3 Learned CP hierarchies in primate communication. The ecological themes raised by Ehret are extended in the work of Snowdon and his coworkers (Snowdon et al. 1985). Snowdon suggests that the functional role of categorization is to contribute to cognitive economy by parsing environmental variation into units that can be processed and manipulated more efficiently than continuous variation, both in sensorimotor performance and in communication. He also provides further evidence (Snowdon 1987) that, as already noted, one of the provisional criteria for CP -- total indiscriminability between members of the same category -- was probably unrealistic (and perhaps an artifact of using unnatural stimuli and unnatural testing conditions).

Snowdon reports that pygmy marmosets can discriminate and categorize calls of familiar individuals even though these all belong to the same superordinate category (say, warning calls). Whether or not the subordinate differences are responded to depends on context (e.g., whether the call issues from inside or outside the perceiver's cage). Since such subordinate differences must be learned rather than innate, Snowdon concludes that experience may play a larger role in CP than some theorists have assumed, and he even conjectures that all CP could be the result of learning.

The observation of subordinate category discriminability is perhaps less surprising in view of the fact that many categorizations are hierarchical, with multiple levels of abstraction (each, presumably, with its own distinctive context of relevant alternative categories); however, the observation provides a valid corrective to the assumption that CP must involve within-category indiscriminability. Yet, since Snowdon does not test subordinate discrimination directly, merely inferring it from the stronger finding of subordinate identification, it remains to determine the shape of the within-category discrimination functions (which will presumably vary with the stimulus domain). Logic suggests that there will have to be some absolute limits on subcategorizability grain, but evidence suggests that discriminability grain will always be somewhat finer (Miller 1956).

1.5.4 Phoneme boundaries in monkeys and chinchillas. The original CP findings with speech stimuli were interpreted as evidence that speech was somehow special -- that there there existed a level of processing which was specialized for speech sounds rather than other kinds of sounds (Liberman 1982). As discussed earlier, the motor theory of speech perception was actually a conjecture that incoming speech sounds were recognized by checking them against how they would have to be produced in speaking, and that CP boundaries arose from the discreteness of certain vocal gestures (such as the bilabial /ba/ and the alveolar /da/). Eimas et al.'s (1971) demonstration of CP in infants ruled out the likelihood that a learned vocal template was involved (since the infants had not yet begun to speak). The second possibility for the motor theory was that it was an evolutionarily prepared speech analyzer, inherited from the adaptive advantages of speech for our vocal ancestors (Liberman 1976). Kuhl (1987) has provided evidence that this cannot be the way in which speech is special either.

Using operant procedures, Kuhl has found that chinchillas and monkeys not only display CP, but their discrimination and identification boundaries occur on the same continua (voicing and place-of-articulation) and at roughly the same points (/ba/, /da/, /ga/; /pa/, /ta/, /ka/) where human CP boundaries do. Since these animals never speak, this must be a natural auditory discontinuity. This still leaves open the possibility for speech to be special in two senses: These natural sensitivities could have been capitalized on and specialized for speech by natural selection, and the human auditory system may still treat speech and speech-like sounds specially as a consequence. Kuhl still considers these questions open.

Kuhl (1986) has also found in infants the kinds of trading relations (see section 2.3.2) and context effects modulating the location of the CP boundary that Eimas et al. (1987) report. These automatic speech-specific adjustments for how something would need to be pronounced are hard to imagine occurring in animals. Still more unlikely in animals is a counterpart of the infant's apparent ability to correlate a speech sound with a face that looks as if it is pronouncing that sound. These phenomena have yet to be checked in animals, however.

It is hard to know yet what to make of these instances of innate auditory CP in animals, on the one hand, and these elements of speech-specific motor-theoretic effects in infants on the other. Perhaps the most prominent question they raise is: How specific to speech is speech CP, and how representative is it of CP in general?

1.5.5 Adaptation level theory and short-term modulation of CP boundaries. Wilson et al (1983) interpret CP in terms of adaptation level theory (Helson 1964). According to adaptation level theory, when stimuli varying in one (or more) sensory attribute are presented to a subject repeatedly, sensory adaptation always takes place, yielding a neutral level for that attribute that is roughly in the midpoint of the range of variation. It is relative to this average value that the other stimuli are perceived as varying. Wilson (1987) shows that a calculation of this adaptation level from paired comparison judgments as to which stimulus is relatively greater in the attribute in question will also predict discrimination and identification functions, with the adaptation level corresponding to the category boundary. Wilson conjectures that the nervous system computes adaptation levels and that this underlies our ability to categorize. She goes on to describe some data in animals and normal and brain-damaged humans to suggest that an analysis in terms of adaptation level can be informative about brain locus and underlying function.

There seem to be three problems with adaptation level theory as a general account of categorical perception:

(1) Since every set of stimuli always has an adaptation level, the theory suggests that there should be many more instances of CP than there actually are. In fact, it seems to imply that all sensory continua should be perceived categorically, with qualitative perceptual differences across the adaptation level. This does not appear to occur.

(2) There seems to be no natural way of accounting for long-term effects with adaptation level theory. One can claim that long-term averaging is always going on, but this does not seem to be a satisfactory predictive account of some of the very dramatic and robust long term CP effects. (Why long-term effects in some continua and not others? Why do short-term adaptation effects not alter long-term boundaries appreciably or permanently?) Yet long-term effects -- including innate, natural boundaries in some cases -- seem to be what is distinctive about CP.

(3) Adaptation level theory does not really seem to explain function (or predict data) since, by definition, it is calculable a priori in all cases. It is not even clear whether its results are really equivalent to standard discrimination and identification functions, even though it simulates their shape, because discrimination and identification derive from independent relative and absolute measures of performance, respectively (paired comparison vs. individual labeling), whereas adaptation level is derived only from relative judgments.

Hence it may be that adaptation level theory's relevance to CP is more limited, namely, it may describe the short-term plasticity (adaptability) of the category boundary, but not its origins or long term plasticity.

1.6 Psychophysiological Indices of CP

1.6.1 Event-Related Potentials (EPs). Some of the questions that arise about the processes and representations underlying CP are difficult to answer using only discrimination and identification data. Furthermore, there are subjects (such as infants and animals) with whom the ordinary psychophysical procedures cannot be used. Event-Related Potentials may provide an extension to behavioral methodologies (Regan 1972; Callaway et al. 1978).

When an event occurs and one looks at the scalp electrical activity accompanying it, it looks like noise. However, if the event is repeated many times and the electrical activity is averaged, the noise cancels out and a characteristic waveform called the Event-Related Potential (EP) becomes measurable. Early components of this waveform seem to reflect sensory processes that are relatively constant across individuals and tasks. Later components are influenced by cognitive variables such as attention and expectation.

1.6.2 Auditory EPs and Phoneme CP. Molfese (1987) has examined EPs accompanying speech and nonspeech auditory stimuli that give rise to CP boundaries. He has found components that are sensitive to between-category differences in two-month-old infants -- younger than could be tested by any other means. He has also found left-right differences in the distribution of these components across the scalp (Molfese & Molfese 1987) and has been able to study the developmental course of changes in some of them. Although it is too early to be confident in any specific interpretation of these effects, the psychophysiological methodology seems promising and is certainly a welcome supplement to psychophysical methods.

1.6.3 Visual EPs and Color CP. Regan (1972, 1975) has worked extensively on visual EPs. He has recently provided a brief sketch of how visual EPs might be used to study color categories (Regan 1987). There are distinct components of the waveform that are sensitive to hue, although it is sometimes difficult to disentangle them from components sensitive to luminance. EP features have been found to vary in their shape and distribution over the head as a function of the color of the stimulus. EP patterns are also differentially correlated with red, green and blue stimuli, consistent with the existence of the three parallel color channels (Boynton 1979).

Apart from their possible utility in the study of color categories, EPs may also contribute to the study of visual form categories. And, as mentioned, later components may be informative about cognitive processes such as expectation, selective attention, match/mismatch detection (Callaway et al. 1978) and perhaps also the subliminal time course of the unconscious processes underlying perceptual judgments (Libet 1985).

1.7 Higher-Order Categories

1.7.1 Features Vs. Prototypes. Research on people's everyday categorization of concrete and abstract objects (see Rosch & Lloyd 1978; Smith & Medin 1981; Neisser 1987) has focused on (a) how quickly and easily an instance is judged to be a member of a category, (b) how typical a member it is judged to be and (c) how subjects report that they are accomplishing the categorization (i.e., what features or rules they feel they use). For example, the reaction time for identifying a robin as a bird is shorter than for identifying a penguin as a bird (whether the stimulus is the name or the image of the bird), a robin is rated as being a better or more typical example of a bird than a penguin is, and subjects report that a robin has more of the features characteristic of a bird than a penguin does. From these data it has been inferred that the representation of a bird does not consist (as some had believed) of a set of defining features that all birds share, features that provide necessary and sufficient conditions for identifying birds. Rather, the representation is a prototype that specific members such as robins and penguins resemble to a greater or lesser degree (hence the variation in reaction time and typicality judgments). It is still an open question what a prototype really is, but it seems to consist of the features of either a typical or an ideal category member, rather than invariant features common to every member.

1.7.2 Sensory and Cognitive Categories. The research on CP and on higher-order categories has been proceeding more-or-less independently (although prototypes and related concepts have lately found their way into CP theory -- see sections 2.3.4 and 2.4.4 and Massaro 1987). Medin & Barsalou (1987) describe parallels between the two lines of categorization research -- one concerned with what they call sensory perception (SP) categories and the other with generic knowledge (GK) categories -- in the hope of integrating and unifying them. They point out similarities, differences, and empirical questions whose investigation would contribute to both lines of research.

Medin & Barsalou distinguish between two different kinds of categories: (1) all-or-none categories and (2) graded ones. There are two subtypes of all-or-none categories: (1a) In well-defined categories all members share a common set of features and a corresponding rule defines these as necessary and sufficient conditions for membership (e.g., for bachelor). (1b) In defined (but not well-defined) categories the features need not be shared by all members, and the rule can be an either/or one (e.g., for a strike in baseball). Graded categories (2) are not defined by an all-or-none rule at all, and membership is a matter of degree.

Clearly, the nature of the representations of both SP and GK categories (e.g., whether they consist of defining features or prototypes) and how categorization is accomplished (e.g., whether by detecting defining features or degree of similarity to a prototype) will depend on whether the categories in question are all-or-none or graded. GK research examines the ease or speed of identification, whereas CP research focuses on discrimination and identification performance itself. The all-or-none question hinges on whether or not the identification boundary itself is graded, which remains to be examined in GK research. In SP research it is in fact the all-or-none nature of the boundary (apart from some psychophysical variance near threshold) that sets apart CP and non-CP continua. A convergence of methods and questions is certainly desirable and promising.

Medin & Barsalou's integration of SP and GK categorization research by analyzing their parallels is very suggestive. In Chapter 18, I will propose an alternative integrative approach, in which the two lines of investigation are related hierarchically , with higher-order concrete and abstract categories (GK) built out of elementary psychophysical ones (SP). The parallels would still exist, for categorization is involved at all levels. But some differences and asymmetries may be accounted for by the fact that the GK categories are being grounded in the SP categories in a bottom-up fashion in my approach.

1.7.3 The Developmental Shift from Holistic to Analytic Representations of Categories.

The representations underlying children's categories, especially as reflected in their word meanings, seem to undergo critical developmental changes Keil 1986a, 1986b). Keil & Kelly (1987) suggest that categories are initially represented by characteristic or instance-bound features (Vygotsky 1962). These are the features of typical members of the category, but not necessarily the features that reliably define membership in the category in general. The shift to such defining features is gradual rather than all-or-none and it is domain-specific, occurring at different times with different categorization problems, even in adulthood. Such shifts seem to occur in all cultures and do not depend on specific changes in parental instructional practises.

Parallel to the characteristic-to-defining shift there is a shift from integral to separable dimensions in perception (Garner 1974). Initially, children see objects holistically, unable to pay selective attention to some features and ignore others. Gradually, certain perceptual dimensions such as size and color become separable and can be used to categorize things. Other dimensions, however, such as hue and saturation, remain integral (although this may be a matter of relative ease of separation, rather than absolute unseparability).

Keil & Kelly suggest that in the holistic stage the child is categorizing on the basis of prototypes: characteristic features with integral dimensions. The only way he can sort things is by their overall similarities. In the later, analytic stage, representations consist of separable, defining features. The child is then sorting things the way we do, and can usually even verbalize the basis for his categorization in the form of a rule or definition.

The questions this hypothesis raises concern the nature of the initial and final representations and the nature of the experience and the processing that produce the shift from one kind of representation to the other. In chapter 18, I will propose that the first, holistic representation is chiefly analog, and that the second, analytic one consists of distinctive features which have been picked out by the category-induction process. (As Keil & Kelley indicate, learning the correct names of objects is the crucial task here.) According to my hypothesis, once learned, the labels that identify categories function as symbols in a third representational system, the one underlying language. The function of the three representational systems then corresponds, respectively, to discrimination, identification and description, with CP being responsible for the shift from the first to the second representational system through the formation of discrete category boundaries that separate the pertinent features.

1.7.4 Spatial Categories and Propositions. Propositions are the abstract meanings that are expressed by sentences. One simplified way to think of them is as predicating something to be true of their subject: Straight (line) proposes that the line is straight. On (cat, mat) proposes that the cat is on the mat. Some cognitive theorists (e.g., Pylyshyn 1980) have hypothesized that all cognitive processes are propositional -- not conscious English sentences, but strings of abstract symbols in the brain formulated in the language of thought (Fodor 1975). These theorists have argued that until and unless the objects we see (and any image-like copies our brains make of them) are turned into these abstract descriptions, we are not yet doing cognition, but relying on a homunculus to look at our images and do our cognition for us.

One of the prominent features of any propositional code is that it must operate on discrete symbols. Hence, in (say) vision, a physical object is viewed, it makes an analog projection on the retina, and then, at some stage in processing, that analog information must somehow be filtered, digitized and turned into symbols for the propositional system to be able to operate on. CP would be an obvious candidate for mediating this sort of analog/digital (A/D) conversion -- if and when it really takes place. Some investigators, however, have tried to show that, at least in some areas of perception, the A/D conversion never happens. These counterexamples have tended to involve spatial perception, a domain in which continuity is particularly important.

Shepard showed that the length of time it takes subjects to make same/different judgments about pairs of shapes, one in standard position and the other rotated, is proportional to the degree of rotation (see Shepard & Cooper 1982). He concluded that subjects mentally rotate images to perform the comparison. To answer him, propositionalists would have to claim that the rotation angle was somehow categorical, with the propositional process taking longer in proportion to the number of rotation increments that had to be encoded symbolically. This has strained some theorists' sense of parsimony, but others have accepted that even spatial perception may be better explained propositionally.

Olson & Bialystok (1983) advocate such a propositional theory of spatial representation. In Bialystok & Olson (1987) they hypothesize that spatial relations are encoded as structural descriptions, each consisting of a spatial predicate with two arguments (e.g., On [cup, table]), and that this accounts for a large body of spatial performance data. They make a distinction between spatial properties that are implicit in a representation (e.g., round, for a child who has the category ball but not yet the category roundness) and properties that are explicitly represented as categories (as demonstrated by the subject's capacity to name and use the property in describing and solving spatial problems such as how an object looks from another orientation). All implicit properties are presumably potentially explicit. Bialystok & Olson also emphasize the relational character of spatial properties (i.e., these are usually described by two-place predicates) and the fact that they are usually binary (e.g., above/below), which they call categorical. They indicate that whereas object boundaries are often fuzzy, spatial property boundaries are strict. To confirm this hypothesized CP effect, however, discrimination/identification tests still remain to be performed.

Bialystok & Olson emphasize that they are not proposing a theory of the source of propositional spatial representations -- only that the assumption that subjects have and use such representations accounts for their spatial performance. Bialystok & Olson do express doubts, however, that these spatial descriptions are the output of feature detectors, either innate or learned (chiefly because they do not consider relations to be features and because there are too many relations), suggesting instead that the predicates themselves may be innate.

The propositional approach to spatial representation raises many unanswered questions, for if correct descriptions are indeed mediating spatial performance then the substantive problem in modeling spatial categorization seems to be that of accounting for how we get from viewing objects to correct (or correctable) descriptions of their spatial properties. Behind every reliable problem-solving performance there is always a description of the successful solution, but proposing that the subject is actually using that description, rather than that he is merely describable as performing in accordance with it, presupposes the solution to the antecedent problem of how he arrived at the right description (or any description at all). CP itself is one possible hypothesis about the origins of spatial categories and their names (on, behind, beside, etc.).

1.8 Cognitive Foundations

1.8.1 A Psychophysical Hypothesis for the Grounding of Concrete and Abstract Categories.

What features of CP can be generalized to categorization as a whole? The hypothesis underlying Chapter 18 is that the identification in the psychophysical labeling task, the identification in object recognition and the identification in linguistic naming are one and the same. This suggests that the task of learning to label regions of a one-dimensional stimulus continuum could be representative of the sorting and naming of all the multidimensional objects in the world. The representations and processes underlying elementary psychophysical label-learning would accordingly be like the representations and processes underlying categorization in general. In addition, it is noteworthy that higher-order categories are constituted of elementary psychophysical ones. The bounded CP category may also be the chunk out of which the rest is built.

If psychophysical labeling performance is indeed representative of categorization in general then what sort of model would be needed to generate it? CP is defined by the discrimination and identification function. Discrimination requires analog stimulus traces for making relative comparisons and for performing other analog operations. These representations need not -- indeed cannot -- be categorical. Identification requires a feature-detector that reliably picks out the features that distinguish the members of an all-or-none category from confusable nonmembers. Features that are not innate must be learned from experience, so categorical representations for learned categories consist of learned feature-detectors. Finally, the labels of the bounded CP categories provide the elementary terms for a third representational system, the symbolic descriptions of natural language (and perhaps also of the language of thought).

Chapter 18 will describe such a three-level representational system -- iconic, categorical and symbolic -- with a special emphasis on the role of learning processes and context. Every category is based on a specific context of alternatives. Without this representative sample of the relevant, confusable complement of a category, the highly underdetermined search for features may never converge (i.e., it may never yield reliable, successful categorization performance; see Harnad, in preparation, c). Moreover, convergence is always provisional: Categories always remain context-relative and approximate. Yet, when they are well-learned, most categories are nevertheless all-or-none, not graded or fuzzy, even though a new, anomalous case could in prinicple cause the provisional feature-detector to fail (i.e., to miscategorize) at any time.

The model to be proposed does not attempt to answer all the questions raised in this review chapter. For example, it is not committed to any specific algorithm for learning; it merely assumes that learning is possible, and takes place somehow (cf. Osherson et al. 1986). However, a sketch is given of a very natural way that a connectionist network (McClelland et al. 1986) could perform this feature-learning function within the three-level representational system hypothesized. The model's usefulness will also be more limited if either the role of learning in CP turns out to be minimal or (what amounts to the same thing) feature-detection turns out not to be a hard (i.e., underdetermined) problem. The model's contribution to the problem of language and meaning will also be limited if perceptual representations of objects don't turn out to have much to do with what we can do with words, and how.

1.8.2 Testability of the Psychophysical Model for Category Representation. The model to be presented in Chapter 18 suggests a very specific program for further research on categorization and CP; the model is tested through neural net simulations in Harnad, Hason * Lubin 1991. To test the validity of the induction/representation hypotheses, the time course of changes in discrimination and identification performance (and perhaps their psychophysiological correlates) during actual label-learning must be measured and modeled as a function of the context of alternatives sampled and the degree of informational underdetermination of the features that are sufficient to subserve reliable categorization. The experimental paradigm is a time-series elaboration of the standard discrimination and identification paradigm of CP research, beginning, systematically and parametrically, with concrete one-dimensional sensory continua and working up to multidimensional and eventually even abstract categories to be identified and discriminated.

Microcomputers make it very easy to generate new and unfamiliar visual and auditory patterns, in one and many dimensions. Interactive programs can provide subjects with instances and feedback as to the correctness of their provisional labeling while at the same time gathering data on how their identification performance develops in real time, from trial to trial, as the sample grows and the context of confusable alternatives widens. Parallel discrimination tests can determine the cumulating effects of the label-learning process on perceived similarity structure -- acquired distinctiveness and similarity -- and especially CP boundary formation . The roles of direct experience and verbal description can also be compared.

We have George Miller's (1956) celebrated limits on informational capacity. These can now be supplemented by an assessment of the limits on just how underdetermined (confusable) the features of a category can be before they render the category unlearnable. Data on how categorization performance improves with time under feedback conditions may provide clues about those early childhood learning processes and tabula rasa mechanisms that the study of old, overlearned categories and their recombination does not seem able to provide. And how label-learners use cumulative information and contend with anomalous instances could also be informative about the real time nature of category learning and revision.

Pari passu with this bottom-up approach to category-learning it would be desirable to model category formation computationally (see Harnad et al 1991). However, instead of modeling the higher-order, overlearned categories that we know people have (but for which we have no idea of a realistic entry point), psychophysical labeling performance could be modeled bottom-up too, implementing the three-way division of labor in the representational architecture that has been proposed here.

1.9 Conclusions

This chapter has reviewed a diversity of research united only by the organizing principle offered by the CP phenomenon. Whether CP can really provide a basis for unified and productive inquiry into how organisms sort the objects of the world into categories or it is merely an amalgam of false starts and side alleys peculiar to speech perception research will depend on just how special CP really is: whether it can indeed furnish the groundwork for categorization in general. Chapter 18 will describe a specific bottom-up model for learning and representing higher-order categories by grounding them in elementary CP categories. The discussion will be motivated by first presenting a series of philosophical questions to which CP and the representational model may suggest some answers. Harnad et al 1991 go on to test the model computationally.

References

Bialystok, E. & Olson, D. R. (1987) Spatial Categories: The Perception and Conceptualization of Spatial Relations. In S. Harnad (Ed.) Categorical perception: The groundwork of cognition. New York: Cambridge University Press

Bornstein, M. H. (1987) Perceptual Categories in Vision and Audition. In S. Harnad (Ed.) Categorical perception: The groundwork of cognition. New York: Cambridge Univerity Press

Diehl, R. L. & Kluender, K. R. (1987) On the Categorization of Speech Sounds. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Ehret, G. (1987) Categorical Perception of Speech Signals: Facts and Hypotheses from Animal Studies. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge University Press

Eimas, P. D., Miller, J. L. & Jusczyk, P. W. (1987) On Infant Speech Perception and the Acquisition of Language. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Keil, F. C. & Kelley, M. H. (1987) Developmental Changes in Category Structure. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Kuhl, P. K. (1987) The Special-Mechanisms Debate in Speech Perception: Nonhuman Species and Nonspeech Signals. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Macmillan, N. A. (1987) Beyond the Categorical/Continuous Distinction: A Psychophysical Approach to Processing Modes. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Massaro, D. W. (1987) Categorical Partitioning: A Fuzzy-Logical Model of Categorization Behavior. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Medin, D. L. & Barsalou, L.W. (1987) Categorization Processes in Category Structure. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Molfese, D. L. (1987) Electrophysiological Indices of Categorical Perception. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Pastore, R. E. (1987) Categorical Percption: Some Psychophysical Models. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge University Press

Regan, D. M. (1987) Evoked Potentials and Colour-Defined Categories. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Remez, R. E. (1987) Neural Models of Speech Perception: A Case History. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Repp, B. H. & Liberman, A. H. (1987) Phonetic Boundaries are Flexible. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Rosen, S. & Howell, P. (1987) Auditory, Articulatory and Learning Explanations of Categorical Perception in Speech. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Snowdon, C. T. (1987) A Naturalistic View of Categorical Perception. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Wilson, M. (1987) Brain Mechanisms in Categorical Perception. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press