MultiWiki: interlingual text passage alignment in Wikipedia
MultiWiki: interlingual text passage alignment in Wikipedia
In this article we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences and build a basis for qualitative analysis of the articles. An important challenge in
this context is the trade-off between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian and the English Wikipedia and collect a user-annotated benchmark. Then we propose MultiWiki – a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. MultiWiki demonstration is publicly available and currently supports four language pairs.
1-31
Gottschalk, Simon
a2ef54de-11d6-4085-8f1c-7eb56aacba1e
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
April 2017
Gottschalk, Simon
a2ef54de-11d6-4085-8f1c-7eb56aacba1e
Demidova, Elena
8af7dea2-8dc6-40da-98b4-ea4a6593f2af
Gottschalk, Simon and Demidova, Elena
(2017)
MultiWiki: interlingual text passage alignment in Wikipedia.
ACM Transactions on the Web, 11 (1), .
(doi:10.1145/3004296).
Abstract
In this article we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences and build a basis for qualitative analysis of the articles. An important challenge in
this context is the trade-off between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian and the English Wikipedia and collect a user-annotated benchmark. Then we propose MultiWiki – a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. MultiWiki demonstration is publicly available and currently supports four language pairs.
Text
tweb_gottschalk_demidova_multiwiki.pdf
- Accepted Manuscript
More information
Accepted/In Press date: 23 November 2016
e-pub ahead of print date: 10 April 2017
Published date: April 2017
Organisations:
Web & Internet Science
Identifiers
Local EPrints ID: 403386
URI: http://eprints.soton.ac.uk/id/eprint/403386
PURE UUID: 5334eac1-041e-478b-b5e4-12fc54dcd911
Catalogue record
Date deposited: 30 Nov 2016 14:45
Last modified: 15 Mar 2024 06:06
Export record
Altmetrics
Contributors
Author:
Simon Gottschalk
Author:
Elena Demidova
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics