Detecting non-gaussian geographical topics in tagged photo collections
Detecting non-gaussian geographical topics in tagged photo collections
Nowadays, large collections of photos are tagged with GPS coordinates. The modelling of such large geo-tagged corpora is an important problem in data mining and information retrieval, and involves the use of geographical information to detect topics with a spatial component. In this paper, we propose a novel geographical topic model which captures dependencies between geographical regions to support the detection of topics with complex, non-Gaussian distributed spatial structures. The model is based on a multi-Dirichlet process (MDP), a novel generalisation of the hierarchical Dirichlet process extended to support multiple base distributions. Our method thus is called the MDP-based geographical topic model (MGTM). We show how to use a MDP to dynamically smooth topic distributions between groups of spatially adjacent documents. In systematic quantitative and qualitative evaluations using independent datasets from prior related work, we show that such a model can exploit the adjacency of regions and leads to a significant improvement in the quality of topics compared to the state of the art in geographical topic modelling.
603-612
Association for Computing Machinery
Kling, Christoph Carl
3e66e481-5a13-444f-87e2-772268e9fdb1
Kunegis, Jérôme
066b7173-f5a6-4a0e-9656-873af0821799
Sizov, Sergej
ecc519ba-5393-4290-8441-b7e2a780444e
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
February 2014
Kling, Christoph Carl
3e66e481-5a13-444f-87e2-772268e9fdb1
Kunegis, Jérôme
066b7173-f5a6-4a0e-9656-873af0821799
Sizov, Sergej
ecc519ba-5393-4290-8441-b7e2a780444e
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Kling, Christoph Carl, Kunegis, Jérôme, Sizov, Sergej and Staab, Steffen
(2014)
Detecting non-gaussian geographical topics in tagged photo collections.
In WSDM '14 Proceedings of the 7th ACM international conference on Web search and data mining.
Association for Computing Machinery.
.
(doi:10.1145/2556195.2556218).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Nowadays, large collections of photos are tagged with GPS coordinates. The modelling of such large geo-tagged corpora is an important problem in data mining and information retrieval, and involves the use of geographical information to detect topics with a spatial component. In this paper, we propose a novel geographical topic model which captures dependencies between geographical regions to support the detection of topics with complex, non-Gaussian distributed spatial structures. The model is based on a multi-Dirichlet process (MDP), a novel generalisation of the hierarchical Dirichlet process extended to support multiple base distributions. Our method thus is called the MDP-based geographical topic model (MGTM). We show how to use a MDP to dynamically smooth topic distributions between groups of spatially adjacent documents. In systematic quantitative and qualitative evaluations using independent datasets from prior related work, we show that such a model can exploit the adjacency of regions and leads to a significant improvement in the quality of topics compared to the state of the art in geographical topic modelling.
Text
wsdm051-klingATS
- Version of Record
Restricted to Repository staff only
Request a copy
More information
e-pub ahead of print date: 24 February 2014
Published date: February 2014
Venue - Dates:
7th ACM International Conference on Web Search and Data Mining (WSDM '14), New York City, United States, 2014-02-24 - 2014-02-28
Identifiers
Local EPrints ID: 413597
URI: http://eprints.soton.ac.uk/id/eprint/413597
PURE UUID: 69dde4f2-c0e2-43aa-897c-119aa1e95f8c
Catalogue record
Date deposited: 30 Aug 2017 16:31
Last modified: 16 Mar 2024 04:22
Export record
Altmetrics
Contributors
Author:
Christoph Carl Kling
Author:
Jérôme Kunegis
Author:
Sergej Sizov
Author:
Steffen Staab
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics