The University of Southampton
University of Southampton Institutional Repository

Location extraction from social media: geoparsing, location disambiguation and geotagging

Location extraction from social media: geoparsing, location disambiguation and geotagging
Location extraction from social media: geoparsing, location disambiguation and geotagging
Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.
1046-8188
1-27
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Kordopatis-Zilos, Giorgos
a69aa09a-56bc-4b34-9f06-b149f2baab1c
Papadopoulos, Symeon
818a6f28-8102-45b4-8e95-53be585ec20a
Kompatsiaris, Yiannis
364cc081-661c-4f71-b6e0-025b02c25592
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Kordopatis-Zilos, Giorgos
a69aa09a-56bc-4b34-9f06-b149f2baab1c
Papadopoulos, Symeon
818a6f28-8102-45b4-8e95-53be585ec20a
Kompatsiaris, Yiannis
364cc081-661c-4f71-b6e0-025b02c25592

Middleton, Stuart, Kordopatis-Zilos, Giorgos, Papadopoulos, Symeon and Kompatsiaris, Yiannis (2018) Location extraction from social media: geoparsing, location disambiguation and geotagging. ACM Transactions on Information Systems, 36 (4), 1-27, [40]. (doi:10.1145/3202662).

Record type: Article

Abstract

Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.

Text
location extraction from social media - Accepted Manuscript
Download (4kB)

More information

Accepted/In Press date: 27 March 2018
e-pub ahead of print date: 15 June 2018

Identifiers

Local EPrints ID: 419443
URI: http://eprints.soton.ac.uk/id/eprint/419443
ISSN: 1046-8188
PURE UUID: 84f38428-d8cf-4eda-a9bd-bfacf0619d22
ORCID for Stuart Middleton: ORCID iD orcid.org/0000-0001-8305-8176

Catalogue record

Date deposited: 12 Apr 2018 16:30
Last modified: 15 Sep 2021 04:55

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×