Using low latency storage to improve RDF store performance
Using low latency storage to improve RDF store performance
 
  Resource Description Framework (RDF) is a flexible, increasingly popular data model that allows for simple representation of arbitrarily structured information. This flexibility allows it to act as an effective underlying data model for the growing Semantic Web. Unfortunately, it remains a challenge to store and query RDF data in a performant manner, with existing stores struggling to meet the needs of demanding applications: particularly low latency, human-interactive systems. This is a result of fundamental properties of RDF data: RDF’s small statement size tends to engender large joins with a lot of random I/O, and its limited structure impedes the generation of compact, relevant statistics for query optimisation.
This thesis posits that the problem of performant RDF storage can be effectively mitigated using in-memory storage, thanks to RAM’s extremely high throughput and rapid random I/O relative to disk. RAM is rapidly reducing in cost, and is finally reaching the stage where it is becoming a practical medium for the storage of substantial databases, particularly given the relatively small size at which RDF datasets become challenging for disk-backed systems.
In-memory storage brings with it its own challenges. The relatively high cost of RAM necessitates a very compact representation, and the changing relationship between memory and CPU (particularly increasing RAM access latency) benefits designs that are aware of that relationship. This thesis presents an investigation into creating CPU-friendly data structures, along with a deep study of the common characteristics of popular RDF datasets. Together, these are used to inform the creation of a new data structure called the Adaptive Hierarchical RDF Index (AHRI), an in-memory, RDF-specific structure that outperforms traditional storage mechanisms in nearly every respect.
AHRI is validated with a comprehensive evaluation against other commonly used inmemory data structures, along with a real world test against a memory-backed store, and a fast disk-based store allowed to cache its data in RAM. The results show that AHRI outperforms these systems with regards to both space consumption and read/write behaviour. The document subsequently describes future work that should provide substantial further improvements, making the use of RAM for RDF storage even more compelling
  rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri
  
    
      Owens, Alisdair
      
        a7080aac-de2d-49ef-bb13-5f0a23304981
      
     
  
  
   
  
  
    
      April 2011
    
    
  
  
    
      Owens, Alisdair
      
        a7080aac-de2d-49ef-bb13-5f0a23304981
      
     
  
    
      schraefel, mc
      
        ac304659-1692-47f6-b892-15113b8c929f
      
     
  
    
      Gibbins, Nicholas
      
        98efd447-4aa7-411c-86d1-955a612eceac
      
     
  
       
    
 
  
    
      
  
 
  
  
  
    Owens, Alisdair
  
  
  
  
   
    (2011)
  
  
    
    Using low latency storage to improve RDF store performance.
  University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 307pp.
  
   
  
    
      Record type:
      Thesis
      
      
      (Doctoral)
    
   
    
    
      
        
          Abstract
          Resource Description Framework (RDF) is a flexible, increasingly popular data model that allows for simple representation of arbitrarily structured information. This flexibility allows it to act as an effective underlying data model for the growing Semantic Web. Unfortunately, it remains a challenge to store and query RDF data in a performant manner, with existing stores struggling to meet the needs of demanding applications: particularly low latency, human-interactive systems. This is a result of fundamental properties of RDF data: RDF’s small statement size tends to engender large joins with a lot of random I/O, and its limited structure impedes the generation of compact, relevant statistics for query optimisation.
This thesis posits that the problem of performant RDF storage can be effectively mitigated using in-memory storage, thanks to RAM’s extremely high throughput and rapid random I/O relative to disk. RAM is rapidly reducing in cost, and is finally reaching the stage where it is becoming a practical medium for the storage of substantial databases, particularly given the relatively small size at which RDF datasets become challenging for disk-backed systems.
In-memory storage brings with it its own challenges. The relatively high cost of RAM necessitates a very compact representation, and the changing relationship between memory and CPU (particularly increasing RAM access latency) benefits designs that are aware of that relationship. This thesis presents an investigation into creating CPU-friendly data structures, along with a deep study of the common characteristics of popular RDF datasets. Together, these are used to inform the creation of a new data structure called the Adaptive Hierarchical RDF Index (AHRI), an in-memory, RDF-specific structure that outperforms traditional storage mechanisms in nearly every respect.
AHRI is validated with a comprehensive evaluation against other commonly used inmemory data structures, along with a real world test against a memory-backed store, and a fast disk-based store allowed to cache its data in RAM. The results show that AHRI outperforms these systems with regards to both space consumption and read/write behaviour. The document subsequently describes future work that should provide substantial further improvements, making the use of RAM for RDF storage even more compelling
         
      
      
        
          
            
  
    Text
 ao-thesis.pdf
     - Other
   
  
  
 
          
            
          
            
           
            
           
        
        
       
    
   
  
  
  More information
  
    
      Published date: April 2011
 
    
  
  
    
  
    
  
    
  
    
  
    
  
    
     
        Keywords:
        rdf, dbms, database, in-memory, cache, branch prediction, data structure, ahri
      
    
  
    
     
        Organisations:
        University of Southampton
      
    
  
    
  
  
  
    
  
  
        Identifiers
        Local EPrints ID: 185969
        URI: http://eprints.soton.ac.uk/id/eprint/185969
        
        
        
        
          PURE UUID: 18b7f9e5-e17b-46d9-b909-88fff92d2186
        
  
    
        
          
        
    
        
          
            
              
            
          
        
    
        
          
            
              
            
          
        
    
  
  Catalogue record
  Date deposited: 24 May 2011 09:02
  Last modified: 15 Mar 2024 03:16
  Export record
  
  
 
 
  
    
    
      Contributors
      
          
          Author:
          
            
            
              Alisdair Owens
            
          
        
      
          
          Thesis advisor:
          
            
              
              
                mc schraefel
              
              
                 
              
            
            
          
         
      
          
          Thesis advisor:
          
            
              
              
                Nicholas Gibbins
              
              
                 
              
            
            
          
         
      
      
      
    
  
   
  
    Download statistics
    
      Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
      
      View more statistics