The University of Southampton
University of Southampton Institutional Repository

On the analysis of big data indexing execution strategies

On the analysis of big data indexing execution strategies
On the analysis of big data indexing execution strategies
Efficient response to search queries is very crucial for data analysts to obtain timely results from big data spanned over heterogeneous machines. Currently, a number of big-data processing frameworks are available in which search operations are performed in distributed and parallel manner. However, implementation of indexing mechanism results in noticeable reduction of overall query processing time. There is an urge to assess the feasibility and impact of indexing towards query execution performance. This paper investigates the performance of state-of-the-art clustered indexing approaches over Hadoop framework which is de facto standard for big data processing. Moreover, this study leverages a comparative analysis of non-clustered indexing overhead in terms of time and space taken by indexing process for varying volume data sets with increasing Index Hit Ratio. Furthermore, the experiments evaluate performance of search operations in terms of data access and retrieval time for queries that use indexes. We then validated the obtained results using Petri net mathematical modeling. We used multiple data sets in our experiments to manifest the impact of growing volume of data on indexing and data search and retrieval performance. The results and highlighted challenges favorably lead researchers towards improved implication of indexing mechanism in perspective of data retrieval from big data. Additionally, this study advocates selection of a non-clustered indexing solution so that optimized search performance over big data is obtained.
big data, indexing, big data processing, data retrieval
1064-1246
3259-3271,
Siddiqa, Aisha
4be8b0e4-9be1-4368-9c0c-00727a86bab1
Karim, Ahmad
2a648c9a-d709-4592-b02e-1141e2d6c0cc
Saba, Tanzila
55b30edd-0690-4061-a26b-172532fb0341
Chang, Victor
a7c75287-b649-4a63-a26c-6af6f26525a4
Siddiqa, Aisha
4be8b0e4-9be1-4368-9c0c-00727a86bab1
Karim, Ahmad
2a648c9a-d709-4592-b02e-1141e2d6c0cc
Saba, Tanzila
55b30edd-0690-4061-a26b-172532fb0341
Chang, Victor
a7c75287-b649-4a63-a26c-6af6f26525a4

Siddiqa, Aisha, Karim, Ahmad, Saba, Tanzila and Chang, Victor (2017) On the analysis of big data indexing execution strategies. Journal of Intelligent & Fuzzy Systems, 32 (5), 3259-3271,. (doi:10.3233/JIFS-169269).

Record type: Article

Abstract

Efficient response to search queries is very crucial for data analysts to obtain timely results from big data spanned over heterogeneous machines. Currently, a number of big-data processing frameworks are available in which search operations are performed in distributed and parallel manner. However, implementation of indexing mechanism results in noticeable reduction of overall query processing time. There is an urge to assess the feasibility and impact of indexing towards query execution performance. This paper investigates the performance of state-of-the-art clustered indexing approaches over Hadoop framework which is de facto standard for big data processing. Moreover, this study leverages a comparative analysis of non-clustered indexing overhead in terms of time and space taken by indexing process for varying volume data sets with increasing Index Hit Ratio. Furthermore, the experiments evaluate performance of search operations in terms of data access and retrieval time for queries that use indexes. We then validated the obtained results using Petri net mathematical modeling. We used multiple data sets in our experiments to manifest the impact of growing volume of data on indexing and data search and retrieval performance. The results and highlighted challenges favorably lead researchers towards improved implication of indexing mechanism in perspective of data retrieval from big data. Additionally, this study advocates selection of a non-clustered indexing solution so that optimized search performance over big data is obtained.

Text
Analysis of Big Data Indexing Execution Strategies - Accepted Manuscript
Download (500kB)

More information

Accepted/In Press date: 3 September 2016
e-pub ahead of print date: 24 April 2017
Published date: 2017
Keywords: big data, indexing, big data processing, data retrieval
Organisations: Electronic & Software Systems

Identifiers

Local EPrints ID: 399946
URI: http://eprints.soton.ac.uk/id/eprint/399946
ISSN: 1064-1246
PURE UUID: ad66c6eb-681c-4692-8705-148b82b13d5f

Catalogue record

Date deposited: 04 Sep 2016 11:08
Last modified: 15 Mar 2024 02:04

Export record

Altmetrics

Contributors

Author: Aisha Siddiqa
Author: Ahmad Karim
Author: Tanzila Saba
Author: Victor Chang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×