The University of Southampton
University of Southampton Institutional Repository

Data for: The unstoppable glottal: Tracking rapid change in an iconic British variable

Data for: The unstoppable glottal: Tracking rapid change in an iconic British variable
Data for: The unstoppable glottal: Tracking rapid change in an iconic British variable
This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the publisher's website. We have concentrated on a number of key stages of sociolinguistic research, with specific reference to collection, processing and statistical analysis of data. Recordings of speech data: we are linguists working on speech data, yet we rely on written data to convey the core materials we work with. We thus include examples of actual speech recordings to provide concrete support for our claim that the data we are working with diverges significantly from mainstream norms. Data preparation and coding Transcription – example of protocol in action: the transcription of speech data must satisfy two, often competing, criteria: it has to be 1) an accurate reflection of what was actually said and 2) transparent and accessible for analysis. How this is achieved is no easy feat, thus we include the full transcription protocol here in order to highlight the complexities in representing speech data in written format: what changes, what does not, and why. Coding and annotation – from sound file to transcript to coded data: this phase of the research is often relegated to one or two lines in a journal article. This is highlighted by our own paper which states that ‘we extracted approx. 100 tokens per speaker per insider/outsider interview’. In this annotation we show how this is actually done, demonstrating how we isolate the linguistic variable in the original text to sound-aligned transcribed data, and how this annotation prepares for eventual extraction of the variable context under analysis. Coding schema: the coding schema arises from two different sources: 1) what has been found in previous research; 2) observation of the current data. As such, there are multiple possibilities for what governs the observed variability. The initial coding schema sets out to test these multiple possibilities. Occam’s Razor is then applied to these multiple categories in sifting the data for the best fit, resulting in a leaner, more interpretable coda schema as presented in the final article. We have included in this annotation the original more elaborated categories to highlight the behind the scenes work that takes place in making sense of the data. We also include sound files of the actual variants used. This allows the user to hear the different environments set out in the final coding schema as used in the object of study: spontaneous speech data. Statistical analysis – the program used: a challenge of statistical analysis is that field constantly evolves. This annotation is a case in point where the version of the program we used is now deprecated and no longer supported. The new version is more than a superficial change to the graphical interface and represents a completely different approach in the way the models are built (stepping-up based on p-values as opposed to stepping-down from fully saturated models). The wider implication is that this can mean that analyses are not fully replicable, particularly as the software becomes obsolete, thus we provide further information on the program used to highlight this potential problem. Statistical analysis – procedure: the description of the statistical analysis which appears in the final journal article is usually a ‘final model’ outlined in a linear fashion but the reality is a model that results from many different iterations where many different models are run and cross-referenced. The final model is a pay off between accuracy and elegance; we are aiming for the ‘best-fit’ but also the simplest or most straightforward computation. As we outline, in this case we decided to model each generation separately as this provided a clearer route to answer our research questions. However, other analysts may argue that a fully saturated model which represents all the interactions together is more accurate. Including this annotation provides further rationale for the model(s) we eventually used in the article.
Syracuse University Qualitative Data Repository
Smith, Jennifer
30db41ba-805c-4537-8927-b86f0a1e2f6f
Holmes-Elliott, Sophie
00469f81-b165-4477-b9ee-d10796a2b9a1
Smith, Jennifer
30db41ba-805c-4537-8927-b86f0a1e2f6f
Holmes-Elliott, Sophie
00469f81-b165-4477-b9ee-d10796a2b9a1

(2018) Data for: The unstoppable glottal: Tracking rapid change in an iconic British variable. Syracuse University Qualitative Data Repository doi:10.5064/f6ope7mj [Dataset]

Record type: Dataset

Abstract

This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the publisher's website. We have concentrated on a number of key stages of sociolinguistic research, with specific reference to collection, processing and statistical analysis of data. Recordings of speech data: we are linguists working on speech data, yet we rely on written data to convey the core materials we work with. We thus include examples of actual speech recordings to provide concrete support for our claim that the data we are working with diverges significantly from mainstream norms. Data preparation and coding Transcription – example of protocol in action: the transcription of speech data must satisfy two, often competing, criteria: it has to be 1) an accurate reflection of what was actually said and 2) transparent and accessible for analysis. How this is achieved is no easy feat, thus we include the full transcription protocol here in order to highlight the complexities in representing speech data in written format: what changes, what does not, and why. Coding and annotation – from sound file to transcript to coded data: this phase of the research is often relegated to one or two lines in a journal article. This is highlighted by our own paper which states that ‘we extracted approx. 100 tokens per speaker per insider/outsider interview’. In this annotation we show how this is actually done, demonstrating how we isolate the linguistic variable in the original text to sound-aligned transcribed data, and how this annotation prepares for eventual extraction of the variable context under analysis. Coding schema: the coding schema arises from two different sources: 1) what has been found in previous research; 2) observation of the current data. As such, there are multiple possibilities for what governs the observed variability. The initial coding schema sets out to test these multiple possibilities. Occam’s Razor is then applied to these multiple categories in sifting the data for the best fit, resulting in a leaner, more interpretable coda schema as presented in the final article. We have included in this annotation the original more elaborated categories to highlight the behind the scenes work that takes place in making sense of the data. We also include sound files of the actual variants used. This allows the user to hear the different environments set out in the final coding schema as used in the object of study: spontaneous speech data. Statistical analysis – the program used: a challenge of statistical analysis is that field constantly evolves. This annotation is a case in point where the version of the program we used is now deprecated and no longer supported. The new version is more than a superficial change to the graphical interface and represents a completely different approach in the way the models are built (stepping-up based on p-values as opposed to stepping-down from fully saturated models). The wider implication is that this can mean that analyses are not fully replicable, particularly as the software becomes obsolete, thus we provide further information on the program used to highlight this potential problem. Statistical analysis – procedure: the description of the statistical analysis which appears in the final journal article is usually a ‘final model’ outlined in a linear fashion but the reality is a model that results from many different iterations where many different models are run and cross-referenced. The final model is a pay off between accuracy and elegance; we are aiming for the ‘best-fit’ but also the simplest or most straightforward computation. As we outline, in this case we decided to model each generation separately as this provided a clearer route to answer our research questions. However, other analysts may argue that a fully saturated model which represents all the interactions together is more accurate. Including this annotation provides further rationale for the model(s) we eventually used in the article.

This record has no associated files available for download.

More information

Published date: 1 January 2018

Identifiers

Local EPrints ID: 449126
URI: http://eprints.soton.ac.uk/id/eprint/449126
PURE UUID: a783b029-dbdd-4ed0-a9fd-9ea51e387f80

Catalogue record

Date deposited: 17 May 2021 16:34
Last modified: 19 Jan 2024 18:15

Export record

Altmetrics

Contributors

Contributor: Jennifer Smith
Contributor: Sophie Holmes-Elliott

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×