Representation transfer and data cleaning in multi-views for text simplification
Representation transfer and data cleaning in multi-views for text simplification
Representation transfer is a widely used technique in natural language processing. We propose methods of cleaning the dominant dataset of text simplification (TS) WikiLarge in multi-views to remove errors that impact model training and fine-tuning. The results show that our method can effectively refine the dataset. We propose to take the pre-trained text representations from a similar task (e.g., text summarization) to text simplification to conduct a continue-fine-tuning strategy to improve the performance of pre-trained models on TS. This approach will speed up the training and make the model convergence easier. Besides, we also propose a new decoding strategy for simple text generation. It is able to generate simpler and more comprehensible text with controllable lexical simplicity. The experimental results show that our method can achieve good performance on many evaluation metrics.
Data cleaning, Decoding, Pre-trained language model, Sentence representation, Text simplification
40-46
He, Wei
8ca42b4c-b746-42ff-b9ec-43c92d89581a
Farrahi, Katayoun
bc848b9c-fc32-475c-b241-f6ade8babacb
Chen, Bin
c57720bd-1de9-4f03-9f30-f740d9efe876
Peng, Bohua
2b9bff20-ab84-495d-8275-dcefb645dae1
Villavicencio, Aline
edf8c965-a3ab-4674-9192-5353d7b9ef38
5 December 2023
He, Wei
8ca42b4c-b746-42ff-b9ec-43c92d89581a
Farrahi, Katayoun
bc848b9c-fc32-475c-b241-f6ade8babacb
Chen, Bin
c57720bd-1de9-4f03-9f30-f740d9efe876
Peng, Bohua
2b9bff20-ab84-495d-8275-dcefb645dae1
Villavicencio, Aline
edf8c965-a3ab-4674-9192-5353d7b9ef38
He, Wei, Farrahi, Katayoun, Chen, Bin, Peng, Bohua and Villavicencio, Aline
(2023)
Representation transfer and data cleaning in multi-views for text simplification.
Pattern Recognition Letters, 177, .
(doi:10.1016/j.patrec.2023.11.011).
Abstract
Representation transfer is a widely used technique in natural language processing. We propose methods of cleaning the dominant dataset of text simplification (TS) WikiLarge in multi-views to remove errors that impact model training and fine-tuning. The results show that our method can effectively refine the dataset. We propose to take the pre-trained text representations from a similar task (e.g., text summarization) to text simplification to conduct a continue-fine-tuning strategy to improve the performance of pre-trained models on TS. This approach will speed up the training and make the model convergence easier. Besides, we also propose a new decoding strategy for simple text generation. It is able to generate simpler and more comprehensible text with controllable lexical simplicity. The experimental results show that our method can achieve good performance on many evaluation metrics.
Text
1-s2.0-S0167865523003215-main
- Version of Record
More information
Accepted/In Press date: 8 November 2023
e-pub ahead of print date: 10 November 2023
Published date: 5 December 2023
Keywords:
Data cleaning, Decoding, Pre-trained language model, Sentence representation, Text simplification
Identifiers
Local EPrints ID: 489975
URI: http://eprints.soton.ac.uk/id/eprint/489975
ISSN: 0167-8655
PURE UUID: 4f302fb0-f017-4d92-a374-9af511e107a0
Catalogue record
Date deposited: 09 May 2024 16:32
Last modified: 10 May 2024 01:51
Export record
Altmetrics
Contributors
Author:
Wei He
Author:
Katayoun Farrahi
Author:
Bin Chen
Author:
Bohua Peng
Author:
Aline Villavicencio
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics