Fractal approach for determining the optimal number of topics in the field of topic modeling
Fractal approach for determining the optimal number of topics in the field of topic modeling
In this paper we apply multifractal formalism to the analysis of statistical behaviour of topic models under condition of varying number of topics. Our analysis reveals the existence of two self-similar regions and one transition region in the function of density-of-states depending on the number of topics. As earlier a function that can be expressed through density-of-states was successfully used to determine the optimal number of topics, we test the applicability of the density-of-states function for the same purpose. We provide numerical results for three topic models (PLSA, ARTM, and LDA Gibbs sampling) on two marked-up collections containing texts in two different languages. Our experiments show that the "true" number of topics, as determined by the human mark-up, occurs in the transition region.
1-7
Ignatenko, Vera
04abdc49-5dbd-4fa6-8226-cc6b608352c8
Koltcov, Sergej
cfe59ca1-6008-4e5e-afde-541203f5e1e1
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Boukhers, Zeyd
0768f27b-2434-442a-bf16-00264e90b3cd
29 March 2019
Ignatenko, Vera
04abdc49-5dbd-4fa6-8226-cc6b608352c8
Koltcov, Sergej
cfe59ca1-6008-4e5e-afde-541203f5e1e1
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Boukhers, Zeyd
0768f27b-2434-442a-bf16-00264e90b3cd
Ignatenko, Vera, Koltcov, Sergej, Staab, Steffen and Boukhers, Zeyd
(2019)
Fractal approach for determining the optimal number of topics in the field of topic modeling.
Journal of Physics: Conference Series, 1163 (conference 1), , [012025].
(doi:10.1088/1742-6596/1163/1/012025).
Abstract
In this paper we apply multifractal formalism to the analysis of statistical behaviour of topic models under condition of varying number of topics. Our analysis reveals the existence of two self-similar regions and one transition region in the function of density-of-states depending on the number of topics. As earlier a function that can be expressed through density-of-states was successfully used to determine the optimal number of topics, we test the applicability of the density-of-states function for the same purpose. We provide numerical results for three topic models (PLSA, ARTM, and LDA Gibbs sampling) on two marked-up collections containing texts in two different languages. Our experiments show that the "true" number of topics, as determined by the human mark-up, occurs in the transition region.
Text
Ignatenko 2019 J. Phys. A Conf. Ser. 1163 012025
- Version of Record
More information
Accepted/In Press date: 19 December 2018
Published date: 29 March 2019
Venue - Dates:
International Conference on Computer Simulation in Physics and Beyond, , Moscow, Russian Federation, 2018-09-24 - 2018-09-27
Identifiers
Local EPrints ID: 430626
URI: http://eprints.soton.ac.uk/id/eprint/430626
ISSN: 1742-6596
PURE UUID: 48c7b3c6-4c23-4acc-8d1b-f3f69eb6df2a
Catalogue record
Date deposited: 07 May 2019 16:30
Last modified: 16 Mar 2024 04:22
Export record
Altmetrics
Contributors
Author:
Vera Ignatenko
Author:
Sergej Koltcov
Author:
Steffen Staab
Author:
Zeyd Boukhers
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics