Using low latency storage to improve RDF store performance
Using low latency storage to improve RDF store performance
Resource Description Framework (RDF) is a flexible, increasingly popular data model that allows for simple representation of arbitrarily structured information. This flexibility allows it to act as an effective underlying data model for the growing Semantic Web. Unfortunately, it remains a challenge to store and query RDF data in a performant manner, with existing stores struggling to meet the needs of demanding applications: particularly low latency, human-interactive systems. This is a result of fundamental properties of RDF data: RDF’s small statement size tends to engender large joins with a lot of random I/O, and its limited structure impedes the generation of compact, relevant statistics for query optimisation.
This thesis posits that the problem of performant RDF storage can be effectively mitigated using in-memory storage, thanks to RAM’s extremely high throughput and rapid random I/O relative to disk. RAM is rapidly reducing in cost, and is finally reaching the stage where it is becoming a practical medium for the storage of substantial databases, particularly given the relatively small size at which RDF datasets become challenging for disk-backed systems.
In-memory storage brings with it its own challenges. The relatively high cost of RAM necessitates a very compact representation, and the changing relationship between memory and CPU (particularly increasing RAM access latency) benefits designs that are aware of that relationship. This thesis presents an investigation into creating CPU-friendly data structures, along with a deep study of the common characteristics of popular RDF datasets. Together, these are used to inform the creation of a new data structure called the Adaptive Hierarchical RDF Index (AHRI), an in-memory, RDF-specific structure that outperforms traditional storage mechanisms in nearly every respect.
AHRI is validated with a comprehensive evaluation against other commonly used inmemory data structures, along with a real world test against a memory-backed store, and a fast disk-based store allowed to cache its data in RAM. The results show that AHRI outperforms these systems with regards to both space consumption and read/write behaviour. The document subsequently describes future work that should provide substantial further improvements, making the use of RAM for RDF storage even more compelling
rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri
Owens, Alisdair
a7080aac-de2d-49ef-bb13-5f0a23304981
April 2011
Owens, Alisdair
a7080aac-de2d-49ef-bb13-5f0a23304981
schraefel, mc
ac304659-1692-47f6-b892-15113b8c929f
Gibbins, Nicholas
98efd447-4aa7-411c-86d1-955a612eceac
Owens, Alisdair
(2011)
Using low latency storage to improve RDF store performance.
University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 307pp.
Record type:
Thesis
(Doctoral)
Abstract
Resource Description Framework (RDF) is a flexible, increasingly popular data model that allows for simple representation of arbitrarily structured information. This flexibility allows it to act as an effective underlying data model for the growing Semantic Web. Unfortunately, it remains a challenge to store and query RDF data in a performant manner, with existing stores struggling to meet the needs of demanding applications: particularly low latency, human-interactive systems. This is a result of fundamental properties of RDF data: RDF’s small statement size tends to engender large joins with a lot of random I/O, and its limited structure impedes the generation of compact, relevant statistics for query optimisation.
This thesis posits that the problem of performant RDF storage can be effectively mitigated using in-memory storage, thanks to RAM’s extremely high throughput and rapid random I/O relative to disk. RAM is rapidly reducing in cost, and is finally reaching the stage where it is becoming a practical medium for the storage of substantial databases, particularly given the relatively small size at which RDF datasets become challenging for disk-backed systems.
In-memory storage brings with it its own challenges. The relatively high cost of RAM necessitates a very compact representation, and the changing relationship between memory and CPU (particularly increasing RAM access latency) benefits designs that are aware of that relationship. This thesis presents an investigation into creating CPU-friendly data structures, along with a deep study of the common characteristics of popular RDF datasets. Together, these are used to inform the creation of a new data structure called the Adaptive Hierarchical RDF Index (AHRI), an in-memory, RDF-specific structure that outperforms traditional storage mechanisms in nearly every respect.
AHRI is validated with a comprehensive evaluation against other commonly used inmemory data structures, along with a real world test against a memory-backed store, and a fast disk-based store allowed to cache its data in RAM. The results show that AHRI outperforms these systems with regards to both space consumption and read/write behaviour. The document subsequently describes future work that should provide substantial further improvements, making the use of RAM for RDF storage even more compelling
Text
ao-thesis.pdf
- Other
More information
Published date: April 2011
Keywords:
rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 185969
URI: http://eprints.soton.ac.uk/id/eprint/185969
PURE UUID: 18b7f9e5-e17b-46d9-b909-88fff92d2186
Catalogue record
Date deposited: 24 May 2011 09:02
Last modified: 15 Mar 2024 03:16
Export record
Contributors
Author:
Alisdair Owens
Thesis advisor:
mc schraefel
Thesis advisor:
Nicholas Gibbins
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics