The University of Southampton
University of Southampton Institutional Repository

Fractal approach for determining the optimal number of topics in the field of topic modeling

Fractal approach for determining the optimal number of topics in the field of topic modeling
Fractal approach for determining the optimal number of topics in the field of topic modeling
In this paper we apply multifractal formalism to the analysis of statistical behaviour of topic models under condition of varying number of topics. Our analysis reveals the existence of two self-similar regions and one transition region in the function of density-of-states depending on the number of topics. As earlier a function that can be expressed through density-of-states was successfully used to determine the optimal number of topics, we test the applicability of the density-of-states function for the same purpose. We provide numerical results for three topic models (PLSA, ARTM, and LDA Gibbs sampling) on two marked-up collections containing texts in two different languages. Our experiments show that the "true" number of topics, as determined by the human mark-up, occurs in the transition region.
1742-6596
1-7
Ignatenko, Vera
04abdc49-5dbd-4fa6-8226-cc6b608352c8
Koltcov, Sergej
cfe59ca1-6008-4e5e-afde-541203f5e1e1
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Boukhers, Zeyd
0768f27b-2434-442a-bf16-00264e90b3cd
Ignatenko, Vera
04abdc49-5dbd-4fa6-8226-cc6b608352c8
Koltcov, Sergej
cfe59ca1-6008-4e5e-afde-541203f5e1e1
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Boukhers, Zeyd
0768f27b-2434-442a-bf16-00264e90b3cd

Ignatenko, Vera, Koltcov, Sergej, Staab, Steffen and Boukhers, Zeyd (2019) Fractal approach for determining the optimal number of topics in the field of topic modeling. Journal of Physics: Conference Series, 1163 (conference 1), 1-7, [012025]. (doi:10.1088/1742-6596/1163/1/012025).

Record type: Article

Abstract

In this paper we apply multifractal formalism to the analysis of statistical behaviour of topic models under condition of varying number of topics. Our analysis reveals the existence of two self-similar regions and one transition region in the function of density-of-states depending on the number of topics. As earlier a function that can be expressed through density-of-states was successfully used to determine the optimal number of topics, we test the applicability of the density-of-states function for the same purpose. We provide numerical results for three topic models (PLSA, ARTM, and LDA Gibbs sampling) on two marked-up collections containing texts in two different languages. Our experiments show that the "true" number of topics, as determined by the human mark-up, occurs in the transition region.

Text
Ignatenko 2019 J. Phys. A Conf. Ser. 1163 012025 - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 19 December 2018
Published date: 29 March 2019
Venue - Dates: International Conference on Computer Simulation in Physics and Beyond, , Moscow, Russian Federation, 2018-09-24 - 2018-09-27

Identifiers

Local EPrints ID: 430626
URI: http://eprints.soton.ac.uk/id/eprint/430626
ISSN: 1742-6596
PURE UUID: 48c7b3c6-4c23-4acc-8d1b-f3f69eb6df2a
ORCID for Steffen Staab: ORCID iD orcid.org/0000-0002-0780-4154

Catalogue record

Date deposited: 07 May 2019 16:30
Last modified: 16 Mar 2024 04:22

Export record

Altmetrics

Contributors

Author: Vera Ignatenko
Author: Sergej Koltcov
Author: Steffen Staab ORCID iD
Author: Zeyd Boukhers

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×