The University of Southampton
University of Southampton Institutional Repository

BiBERT-AV: enhancing authorship verification through Siamese networks with pre-trained BERT and Bi-LSTM

BiBERT-AV: enhancing authorship verification through Siamese networks with pre-trained BERT and Bi-LSTM
BiBERT-AV: enhancing authorship verification through Siamese networks with pre-trained BERT and Bi-LSTM

Authorship verification is a challenging problem in natural language processing. It is crucial in security and forensics, helping identify authors and combat fake news. Recent advancements in neural network models have shown promising results in improving the accuracy of authorship verification. This paper presents a novel model for authorship verification using Siamese networks and evaluates the advantages of transformer-based models over existing methods that rely on domain knowledge and feature engineering. This paper’s objective is to address the authorship verification problem in NLP which entails determining whether two texts were written by the same author by introducing a novel approach that employs Siamese networks with pre-trained BERT and Bi-LSTM layers. The proposed model BiBERT-AV aims to compare the performance of this Siamese network using pre-trained BERT and Bi-LSTM layers against existing methods for authorship verification. The results of this study demonstrate that the proposed Siamese network model BiBERT-AV offers an effective solution for authorship verification that is based solely on the writing style of the author, which outperformed the baselines and state-of-the-art methods. Additionally, our model offers a viable alternative to existing methods that heavily rely on domain knowledge and laborious feature engineering, which often demand significant time and expertise. Notably, the BiBERT-AV model consistently achieves a notable level of accuracy, even when the number of authors is expanded to a larger group. This achievement underscores a notable contrast to the limitations exhibited by the baseline model used in exacting research studies. Overall, this study provides valuable insights into the application of Siamese networks with pre-trained BERT and Bi-LSTM layers for authorship verification and establishes the superiority of the proposed models over existing methods in this domain. The study contributes to the advancement of NLP research and has implications for several real-world applications.

Authorship verification, BERT, Forensics, Security, Siamese networks, Transformer, bi-LSTM
1865-0929
17-30
Springer Singapore
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
Kang, BooJoong
cfccdccd-f57f-448e-9f3c-1c51134c48dd
Hashimy, Nawfal Al
e73b96f2-bf15-40cb-9af5-23c10ea8e319
Wang, Guojun
Wang, Haozhe
Min, Geyong
Georgalas, Nektarios
Meng, Weizhi
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
Kang, BooJoong
cfccdccd-f57f-448e-9f3c-1c51134c48dd
Hashimy, Nawfal Al
e73b96f2-bf15-40cb-9af5-23c10ea8e319
Wang, Guojun
Wang, Haozhe
Min, Geyong
Georgalas, Nektarios
Meng, Weizhi

Almutairi, Amirah, Kang, BooJoong and Hashimy, Nawfal Al (2024) BiBERT-AV: enhancing authorship verification through Siamese networks with pre-trained BERT and Bi-LSTM. Wang, Guojun, Wang, Haozhe, Min, Geyong, Georgalas, Nektarios and Meng, Weizhi (eds.) In Ubiquitous Security: Third International Conference, UbiSec 2023, Exeter, UK, November 1–3, 2023, Revised Selected Papers. vol. 2034, Springer Singapore. pp. 17-30 . (doi:10.1007/978-981-97-1274-8_2).

Record type: Conference or Workshop Item (Paper)

Abstract

Authorship verification is a challenging problem in natural language processing. It is crucial in security and forensics, helping identify authors and combat fake news. Recent advancements in neural network models have shown promising results in improving the accuracy of authorship verification. This paper presents a novel model for authorship verification using Siamese networks and evaluates the advantages of transformer-based models over existing methods that rely on domain knowledge and feature engineering. This paper’s objective is to address the authorship verification problem in NLP which entails determining whether two texts were written by the same author by introducing a novel approach that employs Siamese networks with pre-trained BERT and Bi-LSTM layers. The proposed model BiBERT-AV aims to compare the performance of this Siamese network using pre-trained BERT and Bi-LSTM layers against existing methods for authorship verification. The results of this study demonstrate that the proposed Siamese network model BiBERT-AV offers an effective solution for authorship verification that is based solely on the writing style of the author, which outperformed the baselines and state-of-the-art methods. Additionally, our model offers a viable alternative to existing methods that heavily rely on domain knowledge and laborious feature engineering, which often demand significant time and expertise. Notably, the BiBERT-AV model consistently achieves a notable level of accuracy, even when the number of authors is expanded to a larger group. This achievement underscores a notable contrast to the limitations exhibited by the baseline model used in exacting research studies. Overall, this study provides valuable insights into the application of Siamese networks with pre-trained BERT and Bi-LSTM layers for authorship verification and establishes the superiority of the proposed models over existing methods in this domain. The study contributes to the advancement of NLP research and has implications for several real-world applications.

Text
BiBERT_AV__Enhancing_Authorship_Verification_through_Siamese_Networks_with_Pre_trained_BERT_and_Bi_LSTM - Version of Record
Restricted to Repository staff only
Request a copy

More information

e-pub ahead of print date: 13 March 2024
Published date: 13 March 2024
Keywords: Authorship verification, BERT, Forensics, Security, Siamese networks, Transformer, bi-LSTM

Identifiers

Local EPrints ID: 490641
URI: http://eprints.soton.ac.uk/id/eprint/490641
ISSN: 1865-0929
PURE UUID: d3543cfb-4aa2-4ac6-b633-714137fe1031
ORCID for Amirah Almutairi: ORCID iD orcid.org/0000-0002-2194-7936
ORCID for BooJoong Kang: ORCID iD orcid.org/0000-0001-5984-9867
ORCID for Nawfal Al Hashimy: ORCID iD orcid.org/0000-0002-1129-5217

Catalogue record

Date deposited: 31 May 2024 16:51
Last modified: 06 Jun 2024 02:10

Export record

Altmetrics

Contributors

Author: Amirah Almutairi ORCID iD
Author: BooJoong Kang ORCID iD
Author: Nawfal Al Hashimy ORCID iD
Editor: Guojun Wang
Editor: Haozhe Wang
Editor: Geyong Min
Editor: Nektarios Georgalas
Editor: Weizhi Meng

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×