The University of Southampton
University of Southampton Institutional Repository

PAPAYA: a library for performance analysis of SQL-based RDF processing systems

PAPAYA: a library for performance analysis of SQL-based RDF processing systems
PAPAYA: a library for performance analysis of SQL-based RDF processing systems
Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.
1570-0844
Ragab, Mohamed
70b66274-31dc-474c-82a1-f838ad062a14
Adidarma, Adam Satria
e2700208-b0e6-4fcf-a76e-e57c956f9fa0
Tommasini, Riccardo
eeeacf9f-5cb6-49c2-9341-4c4c10fa5d50
Ragab, Mohamed
70b66274-31dc-474c-82a1-f838ad062a14
Adidarma, Adam Satria
e2700208-b0e6-4fcf-a76e-e57c956f9fa0
Tommasini, Riccardo
eeeacf9f-5cb6-49c2-9341-4c4c10fa5d50

Ragab, Mohamed, Adidarma, Adam Satria and Tommasini, Riccardo (2024) PAPAYA: a library for performance analysis of SQL-based RDF processing systems. Semantic Web. (doi:10.3233/SW-243582).

Record type: Article

Abstract

Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.

Text
sw-prepress_sw--1--1-sw243582_sw--1-sw243582 - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

e-pub ahead of print date: 5 April 2024

Identifiers

Local EPrints ID: 495086
URI: http://eprints.soton.ac.uk/id/eprint/495086
ISSN: 1570-0844
PURE UUID: 01af166b-b621-4176-93fc-15d42c913061

Catalogue record

Date deposited: 29 Oct 2024 17:32
Last modified: 29 Oct 2024 17:35

Export record

Altmetrics

Contributors

Author: Mohamed Ragab
Author: Adam Satria Adidarma
Author: Riccardo Tommasini

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×