The University of Southampton
University of Southampton Institutional Repository

Location extraction from social media: geoparsing, location disambiguation and geotagging

Location extraction from social media: geoparsing, location disambiguation and geotagging
Location extraction from social media: geoparsing, location disambiguation and geotagging
Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.
Location Extraction, Toponym Extraction, Information Extraction, Geoparsing, Geocoding, Geotagging, Location, Toponym, Disambiguation, Social Media
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Kordopatis-Zilos, Giorgos
a69aa09a-56bc-4b34-9f06-b149f2baab1c
Papadopoulos, Symeon
818a6f28-8102-45b4-8e95-53be585ec20a
Kompatsiaris, Yiannis
364cc081-661c-4f71-b6e0-025b02c25592
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Kordopatis-Zilos, Giorgos
a69aa09a-56bc-4b34-9f06-b149f2baab1c
Papadopoulos, Symeon
818a6f28-8102-45b4-8e95-53be585ec20a
Kompatsiaris, Yiannis
364cc081-661c-4f71-b6e0-025b02c25592

Middleton, Stuart, Kordopatis-Zilos, Giorgos, Papadopoulos, Symeon and Kompatsiaris, Yiannis (2019) Location extraction from social media: geoparsing, location disambiguation and geotagging. 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, , Paris, France. 22 - 24 Jul 2019.

Record type: Conference or Workshop Item (Poster)

Abstract

Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.

Text
SIGIR-2019-poster-v2 - Author's Original
Download (434kB)

More information

Published date: 22 July 2019
Additional Information: SIGIR 2019 Poster describing with work originally published in TOIS 2018 paper
Venue - Dates: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, , Paris, France, 2019-07-22 - 2019-07-24
Keywords: Location Extraction, Toponym Extraction, Information Extraction, Geoparsing, Geocoding, Geotagging, Location, Toponym, Disambiguation, Social Media

Identifiers

Local EPrints ID: 432728
URI: http://eprints.soton.ac.uk/id/eprint/432728
PURE UUID: 9820ea1b-8762-4f0d-a6f2-da0d4c466900
ORCID for Stuart Middleton: ORCID iD orcid.org/0000-0001-8305-8176

Catalogue record

Date deposited: 25 Jul 2019 16:30
Last modified: 16 Mar 2024 03:18

Export record

Contributors

Author: Giorgos Kordopatis-Zilos
Author: Symeon Papadopoulos
Author: Yiannis Kompatsiaris

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×