The University of Southampton
University of Southampton Institutional Repository

The Latent Structure of Dictionaries

The Latent Structure of Dictionaries
The Latent Structure of Dictionaries
How many words (and which ones) are sufficient to define all other words? When dictionaries are analyzed as directed graphs with links from defining words to defined words, they reveal a latent structure. Recursively removing all words that are reachable by definition but that do not define any further words reduces the dictionary to a Kernel of about 10%. This is still not the smallest number of words that can define all the rest. About 75% of the Kernel turns out to be its Core, a Strongly Connected Subset of words with a definitional path to and from any pair of its words and no word’s definition depending on a word outside the set. But the Core cannot define all the rest of the dictionary. The 25% of the Kernel surrounding the Core consists of small strongly connected subsets of words: the Satellites. The size of the smallest set of words that can define all the rest (the graph’s Minimum Feedback Vertex Set or MinSet) is about 1% of the dictionary, 15% of the Kernel, and half-Core, half-Satellite. But every dictionary has a huge number of MinSets. The Core words are learned earlier, more frequent, and less concrete than the Satellites, which in turn are learned earlier and more frequent but more concrete than the rest of the Dictionary. In principle, only one MinSet’s words would need to be grounded through the sensorimotor capacity to recognize and categorize their referents. In a dual-code sensorimotor-symbolic model of the mental lexicon, the symbolic code could do all the rest via re-combinatory definition.
symbol grounding, dictionaries, mental lexicon, language evolution, graph theory, semantics, representation
625-659
Vincent-Lamarre, Philippe
7e33b337-edee-4156-ba93-480d261c4600
Blondin Massé, Alexandre
18ba2e88-a976-4dfb-96c4-ab5e4c764202
Lopes, Marcos
2488ab35-4796-469f-9cc0-10895b22cb31
Lord, Mélanie
bd0efee1-34ce-4e2d-a098-42d3972d4517
Marcotte, Odile
44506541-4f8e-44e4-9a91-742596892658
Harnad, Stevan
442ee520-71a1-4283-8e01-106693487d8b
Vincent-Lamarre, Philippe
7e33b337-edee-4156-ba93-480d261c4600
Blondin Massé, Alexandre
18ba2e88-a976-4dfb-96c4-ab5e4c764202
Lopes, Marcos
2488ab35-4796-469f-9cc0-10895b22cb31
Lord, Mélanie
bd0efee1-34ce-4e2d-a098-42d3972d4517
Marcotte, Odile
44506541-4f8e-44e4-9a91-742596892658
Harnad, Stevan
442ee520-71a1-4283-8e01-106693487d8b

Vincent-Lamarre, Philippe, Blondin Massé, Alexandre, Lopes, Marcos, Lord, Mélanie, Marcotte, Odile and Harnad, Stevan (2016) The Latent Structure of Dictionaries. TopiCS in Cognitive Science, 8 (3), 625-659. (doi:10.1111/tops.12211).

Record type: Article

Abstract

How many words (and which ones) are sufficient to define all other words? When dictionaries are analyzed as directed graphs with links from defining words to defined words, they reveal a latent structure. Recursively removing all words that are reachable by definition but that do not define any further words reduces the dictionary to a Kernel of about 10%. This is still not the smallest number of words that can define all the rest. About 75% of the Kernel turns out to be its Core, a Strongly Connected Subset of words with a definitional path to and from any pair of its words and no word’s definition depending on a word outside the set. But the Core cannot define all the rest of the dictionary. The 25% of the Kernel surrounding the Core consists of small strongly connected subsets of words: the Satellites. The size of the smallest set of words that can define all the rest (the graph’s Minimum Feedback Vertex Set or MinSet) is about 1% of the dictionary, 15% of the Kernel, and half-Core, half-Satellite. But every dictionary has a huge number of MinSets. The Core words are learned earlier, more frequent, and less concrete than the Satellites, which in turn are learned earlier and more frequent but more concrete than the rest of the Dictionary. In principle, only one MinSet’s words would need to be grounded through the sensorimotor capacity to recognize and categorize their referents. In a dual-code sensorimotor-symbolic model of the mental lexicon, the symbolic code could do all the rest via re-combinatory definition.

Text
DictpaperFIN-12-01-16.pdf - Accepted Manuscript
Download (2MB)

More information

Accepted/In Press date: 18 May 2016
e-pub ahead of print date: 18 July 2016
Published date: September 2016
Keywords: symbol grounding, dictionaries, mental lexicon, language evolution, graph theory, semantics, representation
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 370845
URI: http://eprints.soton.ac.uk/id/eprint/370845
PURE UUID: ccaf5480-6291-4382-8dbc-75266c62a58c
ORCID for Stevan Harnad: ORCID iD orcid.org/0000-0001-6153-1129

Catalogue record

Date deposited: 08 Nov 2014 01:58
Last modified: 15 Mar 2024 02:48

Export record

Altmetrics

Contributors

Author: Philippe Vincent-Lamarre
Author: Alexandre Blondin Massé
Author: Marcos Lopes
Author: Mélanie Lord
Author: Odile Marcotte
Author: Stevan Harnad ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×