IDN-Sum:a new dataset for interactive digital narrative extractive text summarisation
IDN-Sum:a new dataset for interactive digital narrative extractive text summarisation
Summarizing Interactive Digital Narratives (IDN) presents some unique challenges to existing text summarization models especially around capturing interactive elements in addition to important plot points. In this paper we describe the first IDN dataset (IDN-Sum) designed specifically for training and testing IDN text summarization algorithms. Our dataset is generated using random playthroughs of 8 IDN episodes, taken from 2 different IDN games, and consists of 10,000 documents. Playthrough documents are annotated through automatic alignment with fan-sourced summaries using a commonly used alignment algorithm. We also report and discuss results from experiments applying common baseline extractive text summarization algorithms to this dataset. Qualitative analysis of the results reveal shortcomings in common annotation approaches and evaluation methods when applied to narrative and interactive narrative datasets. The dataset is released as open source for future researchers to train and test their own approaches for IDN text.
interactive narratives, text summarisation, extractive summarisation, natural language processing, NLP
https://aclanthology.org/2022.creativesumm-1.1/
T Revi, Ashwathy
c252029f-823b-437b-8c5e-b67878474aa3
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Millard, David
4f19bca5-80dc-4533-a101-89a5a0e3b372
17 October 2022
T Revi, Ashwathy
c252029f-823b-437b-8c5e-b67878474aa3
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Millard, David
4f19bca5-80dc-4533-a101-89a5a0e3b372
T Revi, Ashwathy, Middleton, Stuart and Millard, David
(2022)
IDN-Sum:a new dataset for interactive digital narrative extractive text summarisation.
COLING-2022: Workshop on Creative Text Summarization, , Gyeongju, Korea, Republic of.
17 - 18 Oct 2022.
12 pp
.
(https://aclanthology.org/2022.creativesumm-1.1/).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Summarizing Interactive Digital Narratives (IDN) presents some unique challenges to existing text summarization models especially around capturing interactive elements in addition to important plot points. In this paper we describe the first IDN dataset (IDN-Sum) designed specifically for training and testing IDN text summarization algorithms. Our dataset is generated using random playthroughs of 8 IDN episodes, taken from 2 different IDN games, and consists of 10,000 documents. Playthrough documents are annotated through automatic alignment with fan-sourced summaries using a commonly used alignment algorithm. We also report and discuss results from experiments applying common baseline extractive text summarization algorithms to this dataset. Qualitative analysis of the results reveal shortcomings in common annotation approaches and evaluation methods when applied to narrative and interactive narrative datasets. The dataset is released as open source for future researchers to train and test their own approaches for IDN text.
Text
IDN-Sum paper
- Accepted Manuscript
More information
Accepted/In Press date: 15 September 2022
Published date: 17 October 2022
Venue - Dates:
COLING-2022: Workshop on Creative Text Summarization, , Gyeongju, Korea, Republic of, 2022-10-17 - 2022-10-18
Keywords:
interactive narratives, text summarisation, extractive summarisation, natural language processing, NLP
Identifiers
Local EPrints ID: 470630
URI: http://eprints.soton.ac.uk/id/eprint/470630
DOI: https://aclanthology.org/2022.creativesumm-1.1/
PURE UUID: 5bcf47f8-de36-489a-945c-ac1691a27917
Catalogue record
Date deposited: 14 Oct 2022 17:09
Last modified: 17 Mar 2024 03:58
Export record
Altmetrics
Contributors
Author:
Ashwathy T Revi
Author:
David Millard
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics