Using low latency storage to improve RDF store performance

Resource Description Framework (RDF) is a flexible, increasingly popular data model that allows for simple representation of arbitrarily structured information. This flexibility allows it to act as an effective underlying data model for the growing Semantic Web. Unfortunately, it remains a challenge to store and query RDF data in a performant manner, with existing stores struggling to meet the needs of demanding applications: particularly low latency, human-interactive systems. This is a result of fundamental properties of RDF data: RDF’s small statement size tends to engender large joins with a lot of random I/O, and its limited structure impedes the generation of compact, relevant statistics for query optimisation.

This thesis posits that the problem of performant RDF storage can be effectively mitigated using in-memory storage, thanks to RAM’s extremely high throughput and rapid random I/O relative to disk. RAM is rapidly reducing in cost, and is finally reaching the stage where it is becoming a practical medium for the storage of substantial databases, particularly given the relatively small size at which RDF datasets become challenging for disk-backed systems.

In-memory storage brings with it its own challenges. The relatively high cost of RAM necessitates a very compact representation, and the changing relationship between memory and CPU (particularly increasing RAM access latency) benefits designs that are aware of that relationship. This thesis presents an investigation into creating CPU-friendly data structures, along with a deep study of the common characteristics of popular RDF datasets. Together, these are used to inform the creation of a new data structure called the Adaptive Hierarchical RDF Index (AHRI), an in-memory, RDF-specific structure that outperforms traditional storage mechanisms in nearly every respect.

AHRI is validated with a comprehensive evaluation against other commonly used inmemory data structures, along with a real world test against a memory-backed store, and a fast disk-based store allowed to cache its data in RAM. The results show that AHRI outperforms these systems with regards to both space consumption and read/write behaviour. The document subsequently describes future work that should provide substantial further improvements, making the use of RAM for RDF storage even more compelling

rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri

Owens, Alisdair

a7080aac-de2d-49ef-bb13-5f0a23304981

April 2011

Owens, Alisdair

a7080aac-de2d-49ef-bb13-5f0a23304981

schraefel, mc

ac304659-1692-47f6-b892-15113b8c929f

Gibbins, Nicholas

98efd447-4aa7-411c-86d1-955a612eceac

Owens, Alisdair (2011) Using low latency storage to improve RDF store performance. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 307pp.

Record type: Thesis (Doctoral)

Abstract

Text

ao-thesis.pdf - Other

Download (6MB)

More information

Published date: April 2011

Keywords: rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri

Organisations: University of Southampton

Identifiers

Local EPrints ID: 185969

URI: http://eprints.soton.ac.uk/id/eprint/185969

PURE UUID: 18b7f9e5-e17b-46d9-b909-88fff92d2186

ORCID for mc schraefel:

orcid.org/0000-0002-9061-7957

ORCID for Nicholas Gibbins:

orcid.org/0000-0002-6140-9956

Catalogue record

Date deposited: 24 May 2011 09:02

Last modified: 15 Mar 2024 03:16

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Alisdair Owens

Thesis advisor: mc schraefel

Thesis advisor: Nicholas Gibbins

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information