Information extraction from the long tail: A socio-technical AI approach for criminology investigations into the online illegal plant trade
Information extraction from the long tail: A socio-technical AI approach for criminology investigations into the online illegal plant trade
In today’s online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.
Artificial Intelligence, CITES, Criminology, Illegal Wildlife Trade, Information Extraction, Natural Language Processing, Socio-technical
82-88
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Lavorgna, Anita
6e34317e-2dda-42b9-8244-14747695598c
Neumann, Geoffrey K
9dfe6611-52bb-4ba6-ad83-b92c7acb4bb3
Whitehead, David
baf6a255-0682-4a9c-af25-3eab6929c43c
6 July 2020
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Lavorgna, Anita
6e34317e-2dda-42b9-8244-14747695598c
Neumann, Geoffrey K
9dfe6611-52bb-4ba6-ad83-b92c7acb4bb3
Whitehead, David
baf6a255-0682-4a9c-af25-3eab6929c43c
Middleton, Stuart, Lavorgna, Anita, Neumann, Geoffrey K and Whitehead, David
(2020)
Information extraction from the long tail: A socio-technical AI approach for criminology investigations into the online illegal plant trade.
1st International Workshop on Socio-technical AI Systems for Defence, Cybercrime and Cybersecurity: Hosted by WebSci'20: 12TH ACM WEB SCIENCE CONFERENCE 2020, , Southampton, United Kingdom.
07 Jul 2020.
.
(doi:10.1145/3394332.3402838).
Record type:
Conference or Workshop Item
(Paper)
Abstract
In today’s online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.
Text
WebSci-2020-STAIDCC-middleton-accepted
- Accepted Manuscript
More information
Published date: 6 July 2020
Additional Information:
Funding Information:
This work was supported by the Economic and Social Research Council (ES/R003254/1) and UK Defence and Security Accelerator, a part of the Ministry of Defence (ACC2005442).
Publisher Copyright:
© 2020 Association for Computing Machinery.
Venue - Dates:
1st International Workshop on Socio-technical AI Systems for Defence, Cybercrime and Cybersecurity: Hosted by WebSci'20: 12TH ACM WEB SCIENCE CONFERENCE 2020, , Southampton, United Kingdom, 2020-07-07 - 2020-07-07
Keywords:
Artificial Intelligence, CITES, Criminology, Illegal Wildlife Trade, Information Extraction, Natural Language Processing, Socio-technical
Identifiers
Local EPrints ID: 441265
URI: http://eprints.soton.ac.uk/id/eprint/441265
PURE UUID: 5a6d8cb6-ae6e-415f-a23a-cfe5a5784b55
Catalogue record
Date deposited: 08 Jun 2020 16:31
Last modified: 17 Mar 2024 03:39
Export record
Altmetrics
Contributors
Author:
Geoffrey K Neumann
Author:
David Whitehead
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics