The University of Southampton
University of Southampton Institutional Repository

Enhancing answer completeness of SPARQL queries via crowdsourcing

Enhancing answer completeness of SPARQL queries via crowdsourcing
Enhancing answer completeness of SPARQL queries via crowdsourcing
Linked Open Data initiatives have encouraged the publication of large RDF datasets into the Linking Open Data (LOD) cloud, including DBpedia, YAGO, and Geo-Names. Despite the size of LOD datasets and the development of (semi-)automatic methods to create and link LOD data, these datasets may be still incomplete, negatively affecting thus accuracy of Linked Data processing techniques. We acquire query answer completeness by capturing knowledge collected from the crowd, and propose a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. Our system, HARE, implements these hybrid query processing techniques. HARE encompasses several features: (1) a completeness model for RDF that exploits the characteristics of RDF in order to estimate the completeness of an RDF dataset; (2) a crowd knowledge base that captures crowd answers about missing values in the RDF dataset; (3) a query engine that combines on-the-fly crowd knowledge and estimates provided by the RDF completeness model, to decide upon the sub-queries of a SPARQL query that should be executed against the dataset or via crowd computing to enhance query answer completeness; and (4) a microtask manager that exploits the semantics encoded in the dataset RDF properties, to crowdsource SPARQL sub-queries as microtasks and update the crowd knowledge base with the results from the crowd. Effectiveness and efficiency of HARE are empirically studied on a collection of 50 SPARQL queries against the DBpedia dataset. Experimental results clearly show that our solution accurately enhances answer completeness.
RDF data, Crowdsourcing, SPARQL query, Crowd knowledge, Query execution, Hybrid system, Microtasks, Completeness model
1570-8268
41-62
Acosta, Maribel
d5aaf441-2c27-4600-bb5a-5b62482b0b08
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Floeck, Fabian
fbcaf058-8549-456c-9587-941629668418
Vidal, Maria-Esther
5c04202a-9b50-4aca-a3fd-f938e5093fed
Acosta, Maribel
d5aaf441-2c27-4600-bb5a-5b62482b0b08
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Floeck, Fabian
fbcaf058-8549-456c-9587-941629668418
Vidal, Maria-Esther
5c04202a-9b50-4aca-a3fd-f938e5093fed

Acosta, Maribel, Simperl, Elena, Floeck, Fabian and Vidal, Maria-Esther (2017) Enhancing answer completeness of SPARQL queries via crowdsourcing. Web Semantics, 45, 41-62. (doi:10.1016/j.websem.2017.07.001).

Record type: Article

Abstract

Linked Open Data initiatives have encouraged the publication of large RDF datasets into the Linking Open Data (LOD) cloud, including DBpedia, YAGO, and Geo-Names. Despite the size of LOD datasets and the development of (semi-)automatic methods to create and link LOD data, these datasets may be still incomplete, negatively affecting thus accuracy of Linked Data processing techniques. We acquire query answer completeness by capturing knowledge collected from the crowd, and propose a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. Our system, HARE, implements these hybrid query processing techniques. HARE encompasses several features: (1) a completeness model for RDF that exploits the characteristics of RDF in order to estimate the completeness of an RDF dataset; (2) a crowd knowledge base that captures crowd answers about missing values in the RDF dataset; (3) a query engine that combines on-the-fly crowd knowledge and estimates provided by the RDF completeness model, to decide upon the sub-queries of a SPARQL query that should be executed against the dataset or via crowd computing to enhance query answer completeness; and (4) a microtask manager that exploits the semantics encoded in the dataset RDF properties, to crowdsource SPARQL sub-queries as microtasks and update the crowd knowledge base with the results from the crowd. Effectiveness and efficiency of HARE are empirically studied on a collection of 50 SPARQL queries against the DBpedia dataset. Experimental results clearly show that our solution accurately enhances answer completeness.

Text
1-s2.0-S1570826817300306-main - Accepted Manuscript
Download (3MB)
Text
1-s2.0-S1570826817300306-main - Version of Record
Restricted to Repository staff only
Request a copy

More information

Accepted/In Press date: 7 July 2017
e-pub ahead of print date: 29 July 2017
Published date: August 2017
Keywords: RDF data, Crowdsourcing, SPARQL query, Crowd knowledge, Query execution, Hybrid system, Microtasks, Completeness model

Identifiers

Local EPrints ID: 413106
URI: http://eprints.soton.ac.uk/id/eprint/413106
ISSN: 1570-8268
PURE UUID: 6e625e2c-0457-4766-be6f-6074cf9bca72
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 15 Aug 2017 16:30
Last modified: 16 Mar 2024 05:38

Export record

Altmetrics

Contributors

Author: Maribel Acosta
Author: Elena Simperl ORCID iD
Author: Fabian Floeck
Author: Maria-Esther Vidal

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×