The University of Southampton
University of Southampton Institutional Repository

What’s new? Analysing language-specific Wikipedia entity contexts to support entity-centric news retrieval

What’s new? Analysing language-specific Wikipedia entity contexts to support entity-centric news retrieval
What’s new? Analysing language-specific Wikipedia entity contexts to support entity-centric news retrieval
Representation of influential entities, such as celebrities and multinational corporations on the web can vary across languages, reflecting language-specific entity aspects, as well as divergent views on these entities in different communities. An important source of multilingual background knowledge about influential entities is Wikipedia - an online community-created encyclopaedia - containing more than 280 language editions. Such language-specific information could be applied in entity-centric information retrieval applications, in which users utilise very simple queries, mostly just the entity names, for the relevant documents. In this article we focus on the problem of creating language-specific entity contexts to support entity-centric, language-specific information retrieval applications. First, we discuss alternative ways such contexts can be built, including Graph-based and Article-based approaches. Second, we analyse the similarities and the differences in these contexts in a case study including 220 entities and five Wikipedia language editions. Third, we propose a context-based entity-centric information retrieval model that maps documents to aspect space, and apply language-specific entity contexts to perform query expansion. Last, we perform a case study to demonstrate the impact of this model in a news retrieval application. Our study illustrates that the proposed model can effectively improve the recall of entity-centric information retrieval while keeping high precision, and provide language-specific results.
2190-9288
210-231
Zhou, Yiwei
d0d5e1f5-adcd-42eb-bbba-4c1406428789
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Cristea, Alexandra I.
e49d8136-3747-4a01-8fde-694151b7d718
Zhou, Yiwei
d0d5e1f5-adcd-42eb-bbba-4c1406428789
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Cristea, Alexandra I.
e49d8136-3747-4a01-8fde-694151b7d718

Zhou, Yiwei, Demidova, Elena and Cristea, Alexandra I. (2017) What’s new? Analysing language-specific Wikipedia entity contexts to support entity-centric news retrieval. [in special issue: Keyword Search and Big Data] Transactions on Computational Collective Intelligence, 210-231. (doi:10.1007/978-3-319-59268-8_10).

Record type: Article

Abstract

Representation of influential entities, such as celebrities and multinational corporations on the web can vary across languages, reflecting language-specific entity aspects, as well as divergent views on these entities in different communities. An important source of multilingual background knowledge about influential entities is Wikipedia - an online community-created encyclopaedia - containing more than 280 language editions. Such language-specific information could be applied in entity-centric information retrieval applications, in which users utilise very simple queries, mostly just the entity names, for the relevant documents. In this article we focus on the problem of creating language-specific entity contexts to support entity-centric, language-specific information retrieval applications. First, we discuss alternative ways such contexts can be built, including Graph-based and Article-based approaches. Second, we analyse the similarities and the differences in these contexts in a case study including 220 entities and five Wikipedia language editions. Third, we propose a context-based entity-centric information retrieval model that maps documents to aspect space, and apply language-specific entity contexts to perform query expansion. Last, we perform a case study to demonstrate the impact of this model in a news retrieval application. Our study illustrates that the proposed model can effectively improve the recall of entity-centric information retrieval while keeping high precision, and provide language-specific results.

Text
ikc_entity_search.pdf - Accepted Manuscript
Download (416kB)

More information

Accepted/In Press date: 7 November 2016
e-pub ahead of print date: 15 June 2017
Published date: June 2017
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 402789
URI: http://eprints.soton.ac.uk/id/eprint/402789
ISSN: 2190-9288
PURE UUID: 018f446e-72cc-498c-a969-11d04813aba9

Catalogue record

Date deposited: 15 Nov 2016 15:07
Last modified: 15 Mar 2024 06:04

Export record

Altmetrics

Contributors

Author: Yiwei Zhou
Author: Elena Demidova
Author: Alexandra I. Cristea

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×