The University of Southampton
University of Southampton Institutional Repository

TopSpin: TOPic discovery via sparse principal component INterference

TopSpin: TOPic discovery via sparse principal component INterference
TopSpin: TOPic discovery via sparse principal component INterference

We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. We first extract a dictionary of visual words and subsequently for each image compute a visual word occurrence histogram. We view these histograms as rows of a large matrix from which we extract sparse principal components (PCs). Each PC identifies a sparse combination of visual words which co-occur frequently in some images but seldom appear in others. Each sparse PC corresponds to a topic, and images whose interference with the PC is high belong to that topic, revealing the common parts possessed by the images. We propose to solve the associated sparse PCA problems using an Alternating Maximization (AM) method, which we modify for the purpose of efficiently extracting multiple PCs in a deflation scheme. Our approach attacks the maximization problem in SPCA directly and is scalable to high-dimensional data. Experiments on automatic topic discovery and category prediction demonstrate encouraging performance of our approach. Our SPCA solver is publicly available.

Bag-of-words, Hidden topic, Sparse PCA, Topic discovery
2194-1009
157-180
Springer New York, NY
Takáč, Martin
4fb42b43-5b23-4430-8047-a52664119823
Ahipaşaoğlu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Cheung, Ngai Man
3b80c96a-1465-4e78-bd47-2415bab2a57a
Richtárik, Peter
6fba6051-a2f1-4602-8962-d3b12647d6ce
Pintér, János D.
Terlaky, Tamás
Takáč, Martin
4fb42b43-5b23-4430-8047-a52664119823
Ahipaşaoğlu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Cheung, Ngai Man
3b80c96a-1465-4e78-bd47-2415bab2a57a
Richtárik, Peter
6fba6051-a2f1-4602-8962-d3b12647d6ce
Pintér, János D.
Terlaky, Tamás

Takáč, Martin, Ahipaşaoğlu, Selin Damla, Cheung, Ngai Man and Richtárik, Peter (2019) TopSpin: TOPic discovery via sparse principal component INterference. Pintér, János D. and Terlaky, Tamás (eds.) In Modeling and Optimization: Theory and Applications - MOPTA 2017, Selected Contributions. vol. 279, Springer New York, NY. pp. 157-180 . (doi:10.1007/978-3-030-12119-8_8).

Record type: Conference or Workshop Item (Paper)

Abstract

We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. We first extract a dictionary of visual words and subsequently for each image compute a visual word occurrence histogram. We view these histograms as rows of a large matrix from which we extract sparse principal components (PCs). Each PC identifies a sparse combination of visual words which co-occur frequently in some images but seldom appear in others. Each sparse PC corresponds to a topic, and images whose interference with the PC is high belong to that topic, revealing the common parts possessed by the images. We propose to solve the associated sparse PCA problems using an Alternating Maximization (AM) method, which we modify for the purpose of efficiently extracting multiple PCs in a deflation scheme. Our approach attacks the maximization problem in SPCA directly and is scalable to high-dimensional data. Experiments on automatic topic discovery and category prediction demonstrate encouraging performance of our approach. Our SPCA solver is publicly available.

This record has no associated files available for download.

More information

e-pub ahead of print date: 15 February 2019
Published date: 2019
Venue - Dates: Modeling and Optimization: Theory and Applications Conference, MOPTA 2017, , Bethlehem, United States, 2017-08-16 - 2017-08-18
Keywords: Bag-of-words, Hidden topic, Sparse PCA, Topic discovery

Identifiers

Local EPrints ID: 502664
URI: http://eprints.soton.ac.uk/id/eprint/502664
ISSN: 2194-1009
PURE UUID: b979cf35-6308-4ded-b848-83a3f58e0f43
ORCID for Selin Damla Ahipaşaoğlu: ORCID iD orcid.org/0000-0003-1371-315X

Catalogue record

Date deposited: 03 Jul 2025 17:03
Last modified: 04 Jul 2025 02:10

Export record

Altmetrics

Contributors

Author: Martin Takáč
Author: Ngai Man Cheung
Author: Peter Richtárik
Editor: János D. Pintér
Editor: Tamás Terlaky

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×