Finding Hay in a Haystack: Analysing Automated Content Extraction in the Manifestos of Far Right Lone Actor Violent Extremists

Gillbard, Edward (2021) Finding Hay in a Haystack: Analysing Automated Content Extraction in the Manifestos of Far Right Lone Actor Violent Extremists. University of Southampton, Doctoral Thesis, 219pp.

Record type: Thesis (Doctoral)

Abstract

In the years leading up to the Covid-19 pandemic, there has been a marked increase in attacks perpetrated by far-right lone actor violent extremists. Alongside these attacks, the online publishing of a manifesto to accompany the attack has become an increasingly normal method of dispersing the ideology of the perpetrator. Naturally, the question becomes whether or not it is possible to identify such content quickly and accurately and thwart such attacks. Computer-assisted content analysis tools such as Linguistic Inquiry and Word Count (LIWC) have been used in previous research to identify categories of words that are important when distinguishing extremist content from non-extremist data. This study argues for the social identity approach to defining extremism suggested in previous research. This becomes the foundation for an investigation into whether or not the manifestos of far-right lone actor violent extremists support the application of various models from social identity theory to the understanding of extremism. In doing so, this study finds that content in said manifestos show support for Smith’s (2000) model of prejudice of group-based emotion and the category dominance model of crossed categorisation. However, whilst Hogg’s (2007) uncertainty-identity theory is suggested as an additional theory suitable for furthering the understanding of extremism, this study finds no evidence either in support or in opposition of this. In carrying out these investigations, this study encounters numerous instances where LIWC misidentifies the meaning of words where a word is not used in the context LIWC expects. Upon further querying of the results of LIWC analysis, it is shown that when using the LIWC2015 standard dictionary, LIWC fails to recognise a high proportion of content relevant to the context of the manifesto. This is suggested to be due to the structuralist and deterministic view of language which LIWC takes, in that the context in which words are used, and thus the meaning of the word, is pre-determined by LIWC. The Natural Language Toolkit (NLTK) is used to identify nouns in the manifesto data and is found to perform better than LIWC in terms of extracting contextual information. However, NLTK also identifies a large amount of information deemed to be irrelevant to the context of the manifesto data. In turn, this irrelevant information is essentially noise in which the contextual information is often lost. Without the benefit of prior knowledge of the ideology of the author and information regarding the accompanying attacks, relevant information is difficult to identify amongst the noise; in some cases, the relevant information simply is not included in the manifesto data. These results strongly suggest that computer-assisted content analysis tools such as LIWC are not suited to the analysis of manifestos of far-right lone actor violent extremists, and perhaps extremist content more generally. This study contributes both a theoretical and methodological critique of the application of computer-assisted content analysis tools to the analysis of extremist content, particularly far right lone actor violent extremist manifestos. Alongside this critique, this study shows evidence in support of the social identity approach to defining extremism, and the application of social identity theory to furthering the understanding of extremism.

Text

Final Thesis - eg16g15 - Version of Record

Available under License University of Southampton Thesis Licence.

Download (883kB)

Text

Gillbard Permission to deposit thesis - form - Version of Record

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.