Learning and evaluation of topics via distributional semantics
Learning and evaluation of topics via distributional semantics
Written language is a means of communication. It not only shapes our thoughts, written language also helps us communicate information. As the amount of digital text available keeps growing, it becomes increasingly difficult to locate and keep track of specific information of interest. This observation has fuelled the search for sophisticated representations of written text, and methods for learning meaning. In particular, topic identification has grown in importance in recent years as an approach to summarise, organise and understand text. Underpinning modern topic identification methods is the framework of distributional semantics which is based on the assumption that meaning is associated with use, and in particular, meaning can be learned by examining the contexts in which words occurs. Motivated by this, we look in this thesis at the broad field of topic identification in text learned via state-of-the-art distributional semantics models. As such, we provide new answers to the complex question of how meaning is used to derive abstract concepts like topics, and how non-expert humans evaluate such abstract concept generated from artificial processes. In more detail, we address three key problems. We first tackle the problem of evaluating the output of topic models (a particular kind of topic identification method) on large text corpora by leveraging non-expert annotators to assess the relevance of topics to a set of documents. Second, we develop a new method to assist in the interpretation of topics by providing additional context. In particular, our solution learns topics as collections of sentences extracted from large corpus of unstructured documents. Finally, we identify and track the topic of text collected over time. In particular, we look at text-based dialogues which often consists of short utterances covering a variety of topics.
University of Southampton
Augustin, Alexandry
dca1be1e-909c-471a-ba63-19da670b095a
December 2020
Augustin, Alexandry
dca1be1e-909c-471a-ba63-19da670b095a
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Augustin, Alexandry
(2020)
Learning and evaluation of topics via distributional semantics.
University of Southampton, Doctoral Thesis, 176pp.
Record type:
Thesis
(Doctoral)
Abstract
Written language is a means of communication. It not only shapes our thoughts, written language also helps us communicate information. As the amount of digital text available keeps growing, it becomes increasingly difficult to locate and keep track of specific information of interest. This observation has fuelled the search for sophisticated representations of written text, and methods for learning meaning. In particular, topic identification has grown in importance in recent years as an approach to summarise, organise and understand text. Underpinning modern topic identification methods is the framework of distributional semantics which is based on the assumption that meaning is associated with use, and in particular, meaning can be learned by examining the contexts in which words occurs. Motivated by this, we look in this thesis at the broad field of topic identification in text learned via state-of-the-art distributional semantics models. As such, we provide new answers to the complex question of how meaning is used to derive abstract concepts like topics, and how non-expert humans evaluate such abstract concept generated from artificial processes. In more detail, we address three key problems. We first tackle the problem of evaluating the output of topic models (a particular kind of topic identification method) on large text corpora by leveraging non-expert annotators to assess the relevance of topics to a set of documents. Second, we develop a new method to assist in the interpretation of topics by providing additional context. In particular, our solution learns topics as collections of sentences extracted from large corpus of unstructured documents. Finally, we identify and track the topic of text collected over time. In particular, we look at text-based dialogues which often consists of short utterances covering a variety of topics.
Text
Alexandry Augustin Thesis
Text
PTD_thesis_Augustin-SIGNED
Restricted to Repository staff only
Restricted to Repository staff only
More information
Published date: December 2020
Identifiers
Local EPrints ID: 447272
URI: http://eprints.soton.ac.uk/id/eprint/447272
PURE UUID: a0c87c00-9b01-4a52-8463-2aa809cec885
Catalogue record
Date deposited: 08 Mar 2021 17:31
Last modified: 17 Mar 2024 03:05
Export record
Contributors
Author:
Alexandry Augustin
Thesis advisor:
Jonathon Hare
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics