Impact analysis of data placement strategies on query efforts in distributed RDF stores
Impact analysis of data placement strategies on query efforts in distributed RDF stores
In the last years, scalable RDF stores in the cloud have been developed, where graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. One main challenge in these RDF stores is the data placement strategy that can be formalized in terms of graph covers. These graph covers determine whether (a) the triples distribution is well-balanced over all storage nodes (storage balance) (b) different query results may be computed on several compute nodes in parallel (vertical parallelization) and (c) individual query results can be produced only from triples assigned to few — ideally one — storage node (horizontal containment). We analyze the impact of three most commonly used graph cover strategies in these terms and found out that balancing query workload reduces the query execution time more than reducing data transfer over network. To this end, we present our novel benchmark and open source evaluation platform Koral.
21-48
Janke, Daniel
4a5d4f28-8add-435f-a223-e2a19d423012
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Thimm, Matthias
ec2f5286-032e-47c9-b169-811a2fa5d3f8
1 May 2018
Janke, Daniel
4a5d4f28-8add-435f-a223-e2a19d423012
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Thimm, Matthias
ec2f5286-032e-47c9-b169-811a2fa5d3f8
Janke, Daniel, Staab, Steffen and Thimm, Matthias
(2018)
Impact analysis of data placement strategies on query efforts in distributed RDF stores.
Journal of Web Semantics, 50, .
(doi:10.1016/j.websem.2018.02.002).
Abstract
In the last years, scalable RDF stores in the cloud have been developed, where graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. One main challenge in these RDF stores is the data placement strategy that can be formalized in terms of graph covers. These graph covers determine whether (a) the triples distribution is well-balanced over all storage nodes (storage balance) (b) different query results may be computed on several compute nodes in parallel (vertical parallelization) and (c) individual query results can be produced only from triples assigned to few — ideally one — storage node (horizontal containment). We analyze the impact of three most commonly used graph cover strategies in these terms and found out that balancing query workload reduces the query execution time more than reducing data transfer over network. To this end, we present our novel benchmark and open source evaluation platform Koral.
Text
tex_journalWebSemantics2017_main
- Accepted Manuscript
More information
Accepted/In Press date: 16 February 2018
e-pub ahead of print date: 21 February 2018
Published date: 1 May 2018
Identifiers
Local EPrints ID: 418131
URI: http://eprints.soton.ac.uk/id/eprint/418131
ISSN: 1570-8268
PURE UUID: 47683a98-67d2-47cf-b3e1-89a204b2bcfb
Catalogue record
Date deposited: 22 Feb 2018 17:30
Last modified: 16 Mar 2024 06:14
Export record
Altmetrics
Contributors
Author:
Daniel Janke
Author:
Steffen Staab
Author:
Matthias Thimm
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics