Learning and evaluation of topics via distributional semantics

Written language is a means of communication. It not only shapes our thoughts, written language also helps us communicate information. As the amount of digital text available keeps growing, it becomes increasingly difficult to locate and keep track of specific information of interest. This observation has fuelled the search for sophisticated representations of written text, and methods for learning meaning. In particular, topic identification has grown in importance in recent years as an approach to summarise, organise and understand text. Underpinning modern topic identification methods is the framework of distributional semantics which is based on the assumption that meaning is associated with use, and in particular, meaning can be learned by examining the contexts in which words occurs. Motivated by this, we look in this thesis at the broad field of topic identification in text learned via state-of-the-art distributional semantics models. As such, we provide new answers to the complex question of how meaning is used to derive abstract concepts like topics, and how non-expert humans evaluate such abstract concept generated from artificial processes. In more detail, we address three key problems. We first tackle the problem of evaluating the output of topic models (a particular kind of topic identification method) on large text corpora by leveraging non-expert annotators to assess the relevance of topics to a set of documents. Second, we develop a new method to assist in the interpretation of topics by providing additional context. In particular, our solution learns topics as collections of sentences extracted from large corpus of unstructured documents. Finally, we identify and track the topic of text collected over time. In particular, we look at text-based dialogues which often consists of short utterances covering a variety of topics.

University of Southampton

Augustin, Alexandry

dca1be1e-909c-471a-ba63-19da670b095a

December 2020

Augustin, Alexandry

dca1be1e-909c-471a-ba63-19da670b095a

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Augustin, Alexandry (2020) Learning and evaluation of topics via distributional semantics. University of Southampton, Doctoral Thesis, 176pp.

Record type: Thesis (Doctoral)

Abstract

Text

Alexandry Augustin Thesis

Available under License University of Southampton Thesis Licence.

Download (7MB)

Text

PTD_thesis_Augustin-SIGNED

Restricted to Repository staff only

Text

3rdpartypermission

Restricted to Repository staff only

More information

Published date: December 2020

Learn more about the School of Electronics and Computer Science

Identifiers

Local EPrints ID: 447272

URI: http://eprints.soton.ac.uk/id/eprint/447272

PURE UUID: a0c87c00-9b01-4a52-8463-2aa809cec885

ORCID for Alexandry Augustin:

orcid.org/0000-0003-0285-9444

ORCID for Jonathon Hare:

orcid.org/0000-0003-2921-4283

Catalogue record

Date deposited: 08 Mar 2021 17:31

Last modified: 17 Mar 2024 03:05

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Alexandry Augustin

Thesis advisor: Jonathon Hare

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information