Distributed human computation framework for linked data co-reference resolution
Distributed human computation framework for linked data co-reference resolution
Distributed Human Computation (DHC) is a technique used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI is envisioned to be a decentralised world-wide information space for sharing machine-readable data with minimal integration costs. There are many research problems in the Semantic Web that are considered as AI-complete problems. An example is co-reference resolution, which involves determining whether different URIs refer to the same entity. This is considered to be a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution in the Semantic Web when integrating distributed datasets. The traditional way to solve this problem is to design machine-learning algorithms. However, they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity co-reference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are published to the Linked Data Cloud.
Linked Data, DHC, Crowd-sourcing, Co-reference
32-46
Yang, Yang
4f250291-4405-49b3-a662-eb9810e00415
Singh, Priyanka
9114f1a3-01e1-47d1-a62c-76ea537c764e
Yao, Jiadi
e07ea12e-212e-4628-92f1-169671c1707a
Au Yeung, Ching Man
c83390b1-d3a1-459e-8f09-01c81576e066
Zareian, Amir
bd43af8c-5109-470a-93c4-e8b7b987000c
Wang, Xiaowei
69bb7b78-673f-4f05-a244-5dbf9f7e5fa3
Cai, Zhonglun
dd8dd525-19a5-4792-a048-617340996afe
Salvadores, Manuel
c1822871-bf33-41cd-bf97-0e927ff74acc
Gibbins, Nicholas
98efd447-4aa7-411c-86d1-955a612eceac
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Shadbolt, Nigel
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
29 May 2011
Yang, Yang
4f250291-4405-49b3-a662-eb9810e00415
Singh, Priyanka
9114f1a3-01e1-47d1-a62c-76ea537c764e
Yao, Jiadi
e07ea12e-212e-4628-92f1-169671c1707a
Au Yeung, Ching Man
c83390b1-d3a1-459e-8f09-01c81576e066
Zareian, Amir
bd43af8c-5109-470a-93c4-e8b7b987000c
Wang, Xiaowei
69bb7b78-673f-4f05-a244-5dbf9f7e5fa3
Cai, Zhonglun
dd8dd525-19a5-4792-a048-617340996afe
Salvadores, Manuel
c1822871-bf33-41cd-bf97-0e927ff74acc
Gibbins, Nicholas
98efd447-4aa7-411c-86d1-955a612eceac
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Shadbolt, Nigel
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
Yang, Yang, Singh, Priyanka, Yao, Jiadi, Au Yeung, Ching Man, Zareian, Amir, Wang, Xiaowei, Cai, Zhonglun, Salvadores, Manuel, Gibbins, Nicholas, Hall, Wendy and Shadbolt, Nigel
(2011)
Distributed human computation framework for linked data co-reference resolution.
8th Extended Semantic Web Conference, LECTURE NOTES IN COMPUTER SCIENCE (LNCS), Volume 6643, , Herakilon, Greece.
29 May - 02 Jun 2011.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Distributed Human Computation (DHC) is a technique used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI is envisioned to be a decentralised world-wide information space for sharing machine-readable data with minimal integration costs. There are many research problems in the Semantic Web that are considered as AI-complete problems. An example is co-reference resolution, which involves determining whether different URIs refer to the same entity. This is considered to be a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution in the Semantic Web when integrating distributed datasets. The traditional way to solve this problem is to design machine-learning algorithms. However, they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity co-reference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are published to the Linked Data Cloud.
Text
paper_10.pdf
- Version of Record
Slideshow
ESWC.pptx
- Other
More information
Published date: 29 May 2011
Additional Information:
Event Dates: 29th May - 2nd June 2011
Venue - Dates:
8th Extended Semantic Web Conference, LECTURE NOTES IN COMPUTER SCIENCE (LNCS), Volume 6643, , Herakilon, Greece, 2011-05-29 - 2011-06-02
Keywords:
Linked Data, DHC, Crowd-sourcing, Co-reference
Organisations:
Web & Internet Science
Identifiers
Local EPrints ID: 272060
URI: http://eprints.soton.ac.uk/id/eprint/272060
PURE UUID: fdcbbc7e-dd38-4db9-bf05-e0715302afb8
Catalogue record
Date deposited: 23 Feb 2011 23:06
Last modified: 15 Mar 2024 03:00
Export record
Contributors
Author:
Yang Yang
Author:
Priyanka Singh
Author:
Jiadi Yao
Author:
Ching Man Au Yeung
Author:
Amir Zareian
Author:
Xiaowei Wang
Author:
Zhonglun Cai
Author:
Manuel Salvadores
Author:
Nicholas Gibbins
Author:
Nigel Shadbolt
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics