Patterns in the English language: phonological networks, percolation and assembly models
Patterns in the English language: phonological networks, percolation and assembly models
In this paper we provide a quantitative framework for the study of phonological networks (PNs) for the English language by carrying out principled
comparisons to null models, either based on site percolation, randomization
techniques, or network growth models. In contrast to previous work, we mainly
focus on null models that reproduce lower order characteristics of the
empirical data. We find that artificial networks matching connectivity
properties of the English PN are exceedingly rare: this leads to the hypothesis
that the word repertoire might have been assembled over time by preferentially
introducing new words which are small modifications of old words. Our null
models are able to explain the "power-law-like" part of the degree
distributions and generally retrieve qualitative features of the PN such as
high clustering, high assortativity coefficient, and small-world
characteristics. However, the detailed comparison to expectations from null
models also points out significant differences, suggesting the presence of
additional constraints in word assembly. Key constraints we identify are the
avoidance of large degrees, the avoidance of triadic closure, and the avoidance
of large non-percolating clusters.
complex networks, language, computational linguistics
Stella, Massimo
37822c93-2522-4bc0-b840-ca32c75efbd7
Brede, Markus
bbd03865-8e0b-4372-b9d7-cd549631f3f7
8 May 2015
Stella, Massimo
37822c93-2522-4bc0-b840-ca32c75efbd7
Brede, Markus
bbd03865-8e0b-4372-b9d7-cd549631f3f7
Stella, Massimo and Brede, Markus
(2015)
Patterns in the English language: phonological networks, percolation and assembly models.
Journal of Statistical Mechanics: Theory and Experiment, 2015 (5), [P05006].
(doi:10.1088/1742-5468/2015/05/P05006).
Abstract
In this paper we provide a quantitative framework for the study of phonological networks (PNs) for the English language by carrying out principled
comparisons to null models, either based on site percolation, randomization
techniques, or network growth models. In contrast to previous work, we mainly
focus on null models that reproduce lower order characteristics of the
empirical data. We find that artificial networks matching connectivity
properties of the English PN are exceedingly rare: this leads to the hypothesis
that the word repertoire might have been assembled over time by preferentially
introducing new words which are small modifications of old words. Our null
models are able to explain the "power-law-like" part of the degree
distributions and generally retrieve qualitative features of the PN such as
high clustering, high assortativity coefficient, and small-world
characteristics. However, the detailed comparison to expectations from null
models also points out significant differences, suggesting the presence of
additional constraints in word assembly. Key constraints we identify are the
avoidance of large degrees, the avoidance of triadic closure, and the avoidance
of large non-percolating clusters.
This record has no associated files available for download.
More information
Accepted/In Press date: 9 March 2015
e-pub ahead of print date: 8 May 2015
Published date: 8 May 2015
Additional Information:
Uses data from 'WordData' by Wolfram Research.
Keywords:
complex networks, language, computational linguistics
Organisations:
Agents, Interactions & Complexity
Identifiers
Local EPrints ID: 375486
URI: http://eprints.soton.ac.uk/id/eprint/375486
PURE UUID: 6ac72080-4ae6-4b3b-847c-9b6e2a5f0671
Catalogue record
Date deposited: 27 Mar 2015 14:08
Last modified: 14 Mar 2024 19:27
Export record
Altmetrics
Contributors
Author:
Massimo Stella
Author:
Markus Brede
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics