TopSpin: TOPic discovery via sparse principal component INterference
TopSpin: TOPic discovery via sparse principal component INterference
We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. We first extract a dictionary of visual words and subsequently for each image compute a visual word occurrence histogram. We view these histograms as rows of a large matrix from which we extract sparse principal components (PCs). Each PC identifies a sparse combination of visual words which co-occur frequently in some images but seldom appear in others. Each sparse PC corresponds to a topic, and images whose interference with the PC is high belong to that topic, revealing the common parts possessed by the images. We propose to solve the associated sparse PCA problems using an Alternating Maximization (AM) method, which we modify for the purpose of efficiently extracting multiple PCs in a deflation scheme. Our approach attacks the maximization problem in SPCA directly and is scalable to high-dimensional data. Experiments on automatic topic discovery and category prediction demonstrate encouraging performance of our approach. Our SPCA solver is publicly available.
Bag-of-words, Hidden topic, Sparse PCA, Topic discovery
157-180
Takáč, Martin
4fb42b43-5b23-4430-8047-a52664119823
Ahipaşaoğlu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Cheung, Ngai Man
3b80c96a-1465-4e78-bd47-2415bab2a57a
Richtárik, Peter
6fba6051-a2f1-4602-8962-d3b12647d6ce
2019
Takáč, Martin
4fb42b43-5b23-4430-8047-a52664119823
Ahipaşaoğlu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Cheung, Ngai Man
3b80c96a-1465-4e78-bd47-2415bab2a57a
Richtárik, Peter
6fba6051-a2f1-4602-8962-d3b12647d6ce
Takáč, Martin, Ahipaşaoğlu, Selin Damla, Cheung, Ngai Man and Richtárik, Peter
(2019)
TopSpin: TOPic discovery via sparse principal component INterference.
Pintér, János D. and Terlaky, Tamás
(eds.)
In Modeling and Optimization: Theory and Applications - MOPTA 2017, Selected Contributions.
vol. 279,
Springer New York, NY.
.
(doi:10.1007/978-3-030-12119-8_8).
Record type:
Conference or Workshop Item
(Paper)
Abstract
We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. We first extract a dictionary of visual words and subsequently for each image compute a visual word occurrence histogram. We view these histograms as rows of a large matrix from which we extract sparse principal components (PCs). Each PC identifies a sparse combination of visual words which co-occur frequently in some images but seldom appear in others. Each sparse PC corresponds to a topic, and images whose interference with the PC is high belong to that topic, revealing the common parts possessed by the images. We propose to solve the associated sparse PCA problems using an Alternating Maximization (AM) method, which we modify for the purpose of efficiently extracting multiple PCs in a deflation scheme. Our approach attacks the maximization problem in SPCA directly and is scalable to high-dimensional data. Experiments on automatic topic discovery and category prediction demonstrate encouraging performance of our approach. Our SPCA solver is publicly available.
This record has no associated files available for download.
More information
e-pub ahead of print date: 15 February 2019
Published date: 2019
Venue - Dates:
Modeling and Optimization: Theory and Applications Conference, MOPTA 2017, , Bethlehem, United States, 2017-08-16 - 2017-08-18
Keywords:
Bag-of-words, Hidden topic, Sparse PCA, Topic discovery
Identifiers
Local EPrints ID: 502664
URI: http://eprints.soton.ac.uk/id/eprint/502664
ISSN: 2194-1009
PURE UUID: b979cf35-6308-4ded-b848-83a3f58e0f43
Catalogue record
Date deposited: 03 Jul 2025 17:03
Last modified: 04 Jul 2025 02:10
Export record
Altmetrics
Contributors
Author:
Martin Takáč
Author:
Ngai Man Cheung
Author:
Peter Richtárik
Editor:
János D. Pintér
Editor:
Tamás Terlaky
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics