The University of Southampton
University of Southampton Institutional Repository

Using low latency storage to improve RDF store performance

Using low latency storage to improve RDF store performance
Using low latency storage to improve RDF store performance
Resource Description Framework (RDF) is a flexible, increasingly popular data model that allows for simple representation of arbitrarily structured information. This flexibility allows it to act as an effective underlying data model for the growing Semantic Web. Unfortunately, it remains a challenge to store and query RDF data in a performant manner, with existing stores struggling to meet the needs of demanding applications: particularly low latency, human-interactive systems. This is a result of fundamental properties of RDF data: RDF’s small statement size tends to engender large joins with a lot of random I/O, and its limited structure impedes the generation of compact, relevant statistics for query optimisation.

This thesis posits that the problem of performant RDF storage can be effectively mitigated using in-memory storage, thanks to RAM’s extremely high throughput and rapid random I/O relative to disk. RAM is rapidly reducing in cost, and is finally reaching the stage where it is becoming a practical medium for the storage of substantial databases, particularly given the relatively small size at which RDF datasets become challenging for disk-backed systems.

In-memory storage brings with it its own challenges. The relatively high cost of RAM necessitates a very compact representation, and the changing relationship between memory and CPU (particularly increasing RAM access latency) benefits designs that are aware of that relationship. This thesis presents an investigation into creating CPU-friendly data structures, along with a deep study of the common characteristics of popular RDF datasets. Together, these are used to inform the creation of a new data structure called the Adaptive Hierarchical RDF Index (AHRI), an in-memory, RDF-specific structure that outperforms traditional storage mechanisms in nearly every respect.

AHRI is validated with a comprehensive evaluation against other commonly used inmemory data structures, along with a real world test against a memory-backed store, and a fast disk-based store allowed to cache its data in RAM. The results show that AHRI outperforms these systems with regards to both space consumption and read/write behaviour. The document subsequently describes future work that should provide substantial further improvements, making the use of RAM for RDF storage even more compelling
rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri
Owens, Alisdair
a7080aac-de2d-49ef-bb13-5f0a23304981
Owens, Alisdair
a7080aac-de2d-49ef-bb13-5f0a23304981
schraefel, mc
ac304659-1692-47f6-b892-15113b8c929f
Gibbins, Nicholas
98efd447-4aa7-411c-86d1-955a612eceac

Owens, Alisdair (2011) Using low latency storage to improve RDF store performance. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 307pp.

Record type: Thesis (Doctoral)

Abstract

Resource Description Framework (RDF) is a flexible, increasingly popular data model that allows for simple representation of arbitrarily structured information. This flexibility allows it to act as an effective underlying data model for the growing Semantic Web. Unfortunately, it remains a challenge to store and query RDF data in a performant manner, with existing stores struggling to meet the needs of demanding applications: particularly low latency, human-interactive systems. This is a result of fundamental properties of RDF data: RDF’s small statement size tends to engender large joins with a lot of random I/O, and its limited structure impedes the generation of compact, relevant statistics for query optimisation.

This thesis posits that the problem of performant RDF storage can be effectively mitigated using in-memory storage, thanks to RAM’s extremely high throughput and rapid random I/O relative to disk. RAM is rapidly reducing in cost, and is finally reaching the stage where it is becoming a practical medium for the storage of substantial databases, particularly given the relatively small size at which RDF datasets become challenging for disk-backed systems.

In-memory storage brings with it its own challenges. The relatively high cost of RAM necessitates a very compact representation, and the changing relationship between memory and CPU (particularly increasing RAM access latency) benefits designs that are aware of that relationship. This thesis presents an investigation into creating CPU-friendly data structures, along with a deep study of the common characteristics of popular RDF datasets. Together, these are used to inform the creation of a new data structure called the Adaptive Hierarchical RDF Index (AHRI), an in-memory, RDF-specific structure that outperforms traditional storage mechanisms in nearly every respect.

AHRI is validated with a comprehensive evaluation against other commonly used inmemory data structures, along with a real world test against a memory-backed store, and a fast disk-based store allowed to cache its data in RAM. The results show that AHRI outperforms these systems with regards to both space consumption and read/write behaviour. The document subsequently describes future work that should provide substantial further improvements, making the use of RAM for RDF storage even more compelling

Text
ao-thesis.pdf - Other
Download (6MB)

More information

Published date: April 2011
Keywords: rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri
Organisations: University of Southampton

Identifiers

Local EPrints ID: 185969
URI: http://eprints.soton.ac.uk/id/eprint/185969
PURE UUID: 18b7f9e5-e17b-46d9-b909-88fff92d2186
ORCID for mc schraefel: ORCID iD orcid.org/0000-0002-9061-7957
ORCID for Nicholas Gibbins: ORCID iD orcid.org/0000-0002-6140-9956

Catalogue record

Date deposited: 24 May 2011 09:02
Last modified: 15 Mar 2024 03:16

Export record

Contributors

Author: Alisdair Owens
Thesis advisor: mc schraefel ORCID iD
Thesis advisor: Nicholas Gibbins ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×