The University of Southampton
University of Southampton Institutional Repository

Information extraction from the long tail: A socio-technical AI approach for criminology investigations into the online illegal plant trade

Information extraction from the long tail: A socio-technical AI approach for criminology investigations into the online illegal plant trade
Information extraction from the long tail: A socio-technical AI approach for criminology investigations into the online illegal plant trade
In today’s online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.
Artificial Intelligence, CITES, Criminology, Illegal Wildlife Trade, Information Extraction, Natural Language Processing, Socio-technical
82-88
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Lavorgna, Anita
6e34317e-2dda-42b9-8244-14747695598c
Neumann, Geoffrey K
9dfe6611-52bb-4ba6-ad83-b92c7acb4bb3
Whitehead, David
baf6a255-0682-4a9c-af25-3eab6929c43c
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Lavorgna, Anita
6e34317e-2dda-42b9-8244-14747695598c
Neumann, Geoffrey K
9dfe6611-52bb-4ba6-ad83-b92c7acb4bb3
Whitehead, David
baf6a255-0682-4a9c-af25-3eab6929c43c

Middleton, Stuart, Lavorgna, Anita, Neumann, Geoffrey K and Whitehead, David (2020) Information extraction from the long tail: A socio-technical AI approach for criminology investigations into the online illegal plant trade. 1st International Workshop on Socio-technical AI Systems for Defence, Cybercrime and Cybersecurity: Hosted by WebSci'20: 12TH ACM WEB SCIENCE CONFERENCE 2020, , Southampton, United Kingdom. 07 Jul 2020. pp. 82-88 .

Record type: Conference or Workshop Item (Paper)

Abstract

In today’s online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.

Text
WebSci-2020-STAIDCC-middleton-accepted - Accepted Manuscript
Download (4kB)

More information

Published date: 7 July 2020
Venue - Dates: 1st International Workshop on Socio-technical AI Systems for Defence, Cybercrime and Cybersecurity: Hosted by WebSci'20: 12TH ACM WEB SCIENCE CONFERENCE 2020, , Southampton, United Kingdom, 2020-07-07 - 2020-07-07
Keywords: Artificial Intelligence, CITES, Criminology, Illegal Wildlife Trade, Information Extraction, Natural Language Processing, Socio-technical

Identifiers

Local EPrints ID: 441265
URI: http://eprints.soton.ac.uk/id/eprint/441265
PURE UUID: 5a6d8cb6-ae6e-415f-a23a-cfe5a5784b55
ORCID for Stuart Middleton: ORCID iD orcid.org/0000-0001-8305-8176
ORCID for Anita Lavorgna: ORCID iD orcid.org/0000-0001-8484-1613

Catalogue record

Date deposited: 08 Jun 2020 16:31
Last modified: 15 Sep 2021 02:05

Export record

Contributors

Author: Anita Lavorgna ORCID iD
Author: Geoffrey K Neumann
Author: David Whitehead

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×