Finding Hay in a Haystack: Analysing Automated Content Extraction in the Manifestos of Far Right Lone Actor Violent Extremists
Finding Hay in a Haystack: Analysing Automated Content Extraction in the Manifestos of Far Right Lone Actor Violent Extremists
In the years leading up to the Covid-19 pandemic, there has been a marked increase in attacks perpetrated by far-right lone actor violent extremists. Alongside these attacks, the online publishing of a manifesto to accompany the attack has become an increasingly normal method of dispersing the ideology of the perpetrator. Naturally, the question becomes whether or not it is possible to identify such content quickly and accurately and thwart such attacks. Computer-assisted content analysis tools such as Linguistic Inquiry and Word Count (LIWC) have been used in previous research to identify categories of words that are important when distinguishing extremist content from non-extremist data. This study argues for the social identity approach to defining extremism suggested in previous research. This becomes the foundation for an investigation into whether or not the manifestos of far-right lone actor violent extremists support the application of various models from social identity theory to the understanding of extremism. In doing so, this study finds that content in said manifestos show support for Smith’s (2000) model of prejudice of group-based emotion and the category dominance model of crossed categorisation. However, whilst Hogg’s (2007) uncertainty-identity theory is suggested as an additional theory suitable for furthering the understanding of extremism, this study finds no evidence either in support or in opposition of this. In carrying out these investigations, this study encounters numerous instances where LIWC misidentifies the meaning of words where a word is not used in the context LIWC expects. Upon further querying of the results of LIWC analysis, it is shown that when using the LIWC2015 standard dictionary, LIWC fails to recognise a high proportion of content relevant to the context of the manifesto. This is suggested to be due to the structuralist and deterministic view of language which LIWC takes, in that the context in which words are used, and thus the meaning of the word, is pre-determined by LIWC. The Natural Language Toolkit (NLTK) is used to identify nouns in the manifesto data and is found to perform better than LIWC in terms of extracting contextual information. However, NLTK also identifies a large amount of information deemed to be irrelevant to the context of the manifesto data. In turn, this irrelevant information is essentially noise in which the contextual information is often lost. Without the benefit of prior knowledge of the ideology of the author and information regarding the accompanying attacks, relevant information is difficult to identify amongst the noise; in some cases, the relevant information simply is not included in the manifesto data. These results strongly suggest that computer-assisted content analysis tools such as LIWC are not suited to the analysis of manifestos of far-right lone actor violent extremists, and perhaps extremist content more generally. This study contributes both a theoretical and methodological critique of the application of computer-assisted content analysis tools to the analysis of extremist content, particularly far right lone actor violent extremist manifestos. Alongside this critique, this study shows evidence in support of the social identity approach to defining extremism, and the application of social identity theory to furthering the understanding of extremism.
University of Southampton
Gillbard, Edward
0a02ea91-853c-4965-b8b1-8ba986ccc7ad
Gillbard, Edward
0a02ea91-853c-4965-b8b1-8ba986ccc7ad
Webber, Craig
35851bbe-83e6-4c9b-9dd2-cdf1f60c245d
Gillbard, Edward
(2021)
Finding Hay in a Haystack: Analysing Automated Content Extraction in the Manifestos of Far Right Lone Actor Violent Extremists.
University of Southampton, Doctoral Thesis, 219pp.
Record type:
Thesis
(Doctoral)
Abstract
In the years leading up to the Covid-19 pandemic, there has been a marked increase in attacks perpetrated by far-right lone actor violent extremists. Alongside these attacks, the online publishing of a manifesto to accompany the attack has become an increasingly normal method of dispersing the ideology of the perpetrator. Naturally, the question becomes whether or not it is possible to identify such content quickly and accurately and thwart such attacks. Computer-assisted content analysis tools such as Linguistic Inquiry and Word Count (LIWC) have been used in previous research to identify categories of words that are important when distinguishing extremist content from non-extremist data. This study argues for the social identity approach to defining extremism suggested in previous research. This becomes the foundation for an investigation into whether or not the manifestos of far-right lone actor violent extremists support the application of various models from social identity theory to the understanding of extremism. In doing so, this study finds that content in said manifestos show support for Smith’s (2000) model of prejudice of group-based emotion and the category dominance model of crossed categorisation. However, whilst Hogg’s (2007) uncertainty-identity theory is suggested as an additional theory suitable for furthering the understanding of extremism, this study finds no evidence either in support or in opposition of this. In carrying out these investigations, this study encounters numerous instances where LIWC misidentifies the meaning of words where a word is not used in the context LIWC expects. Upon further querying of the results of LIWC analysis, it is shown that when using the LIWC2015 standard dictionary, LIWC fails to recognise a high proportion of content relevant to the context of the manifesto. This is suggested to be due to the structuralist and deterministic view of language which LIWC takes, in that the context in which words are used, and thus the meaning of the word, is pre-determined by LIWC. The Natural Language Toolkit (NLTK) is used to identify nouns in the manifesto data and is found to perform better than LIWC in terms of extracting contextual information. However, NLTK also identifies a large amount of information deemed to be irrelevant to the context of the manifesto data. In turn, this irrelevant information is essentially noise in which the contextual information is often lost. Without the benefit of prior knowledge of the ideology of the author and information regarding the accompanying attacks, relevant information is difficult to identify amongst the noise; in some cases, the relevant information simply is not included in the manifesto data. These results strongly suggest that computer-assisted content analysis tools such as LIWC are not suited to the analysis of manifestos of far-right lone actor violent extremists, and perhaps extremist content more generally. This study contributes both a theoretical and methodological critique of the application of computer-assisted content analysis tools to the analysis of extremist content, particularly far right lone actor violent extremist manifestos. Alongside this critique, this study shows evidence in support of the social identity approach to defining extremism, and the application of social identity theory to furthering the understanding of extremism.
Text
Final Thesis - eg16g15
- Version of Record
Text
Gillbard Permission to deposit thesis - form
- Version of Record
Restricted to Repository staff only
More information
Submitted date: September 2021
Identifiers
Local EPrints ID: 456992
URI: http://eprints.soton.ac.uk/id/eprint/456992
PURE UUID: fe9c8ce3-6244-4be8-b724-6d0d3bac9cb9
Catalogue record
Date deposited: 19 May 2022 16:33
Last modified: 17 Mar 2024 02:51
Export record
Contributors
Author:
Edward Gillbard
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics