Location extraction from social media: geoparsing, location disambiguation and geotagging
Location extraction from social media: geoparsing, location disambiguation and geotagging
Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.
Location Extraction, Toponym Extraction, Information Extraction, Geoparsing, Geocoding, Geotagging, Location, Toponym, Disambiguation, Social Media
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Kordopatis-Zilos, Giorgos
a69aa09a-56bc-4b34-9f06-b149f2baab1c
Papadopoulos, Symeon
818a6f28-8102-45b4-8e95-53be585ec20a
Kompatsiaris, Yiannis
364cc081-661c-4f71-b6e0-025b02c25592
22 July 2019
Middleton, Stuart
404b62ba-d77e-476b-9775-32645b04473f
Kordopatis-Zilos, Giorgos
a69aa09a-56bc-4b34-9f06-b149f2baab1c
Papadopoulos, Symeon
818a6f28-8102-45b4-8e95-53be585ec20a
Kompatsiaris, Yiannis
364cc081-661c-4f71-b6e0-025b02c25592
Middleton, Stuart, Kordopatis-Zilos, Giorgos, Papadopoulos, Symeon and Kompatsiaris, Yiannis
(2019)
Location extraction from social media: geoparsing, location disambiguation and geotagging.
42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, , Paris, France.
22 - 24 Jul 2019.
Record type:
Conference or Workshop Item
(Poster)
Abstract
Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.
Text
SIGIR-2019-poster-v2
- Author's Original
More information
Published date: 22 July 2019
Additional Information:
SIGIR 2019 Poster describing with work originally published in TOIS 2018 paper
Venue - Dates:
42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, , Paris, France, 2019-07-22 - 2019-07-24
Keywords:
Location Extraction, Toponym Extraction, Information Extraction, Geoparsing, Geocoding, Geotagging, Location, Toponym, Disambiguation, Social Media
Identifiers
Local EPrints ID: 432728
URI: http://eprints.soton.ac.uk/id/eprint/432728
PURE UUID: 9820ea1b-8762-4f0d-a6f2-da0d4c466900
Catalogue record
Date deposited: 25 Jul 2019 16:30
Last modified: 16 Mar 2024 03:18
Export record
Contributors
Author:
Giorgos Kordopatis-Zilos
Author:
Symeon Papadopoulos
Author:
Yiannis Kompatsiaris
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics