The University of Southampton
University of Southampton Institutional Repository

MultiWiki: interlingual text passage alignment in Wikipedia

MultiWiki: interlingual text passage alignment in Wikipedia
MultiWiki: interlingual text passage alignment in Wikipedia
In this article we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences and build a basis for qualitative analysis of the articles. An important challenge in
this context is the trade-off between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian and the English Wikipedia and collect a user-annotated benchmark. Then we propose MultiWiki – a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. MultiWiki demonstration is publicly available and currently supports four language pairs.
1-31
Gottschalk, Simon
a2ef54de-11d6-4085-8f1c-7eb56aacba1e
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Gottschalk, Simon
a2ef54de-11d6-4085-8f1c-7eb56aacba1e
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af

Gottschalk, Simon and Demidova, Elena (2017) MultiWiki: interlingual text passage alignment in Wikipedia. ACM Transactions on the Web, 11 (1), 1-31. (doi:10.1145/3004296).

Record type: Article

Abstract

In this article we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences and build a basis for qualitative analysis of the articles. An important challenge in
this context is the trade-off between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian and the English Wikipedia and collect a user-annotated benchmark. Then we propose MultiWiki – a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. MultiWiki demonstration is publicly available and currently supports four language pairs.

Text
tweb_gottschalk_demidova_multiwiki.pdf - Accepted Manuscript
Download (590kB)

More information

Accepted/In Press date: 23 November 2016
e-pub ahead of print date: 10 April 2017
Published date: April 2017
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 403386
URI: http://eprints.soton.ac.uk/id/eprint/403386
PURE UUID: 5334eac1-041e-478b-b5e4-12fc54dcd911

Catalogue record

Date deposited: 30 Nov 2016 14:45
Last modified: 15 Mar 2024 06:06

Export record

Altmetrics

Contributors

Author: Simon Gottschalk
Author: Elena Demidova

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×