On the analysis of big data indexing execution strategies
On the analysis of big data indexing execution strategies
Efficient response to search queries is very crucial for data analysts to obtain timely results from big data spanned over heterogeneous machines. Currently, a number of big-data processing frameworks are available in which search operations are performed in distributed and parallel manner. However, implementation of indexing mechanism results in noticeable reduction of overall query processing time. There is an urge to assess the feasibility and impact of indexing towards query execution performance. This paper investigates the performance of state-of-the-art clustered indexing approaches over Hadoop framework which is de facto standard for big data processing. Moreover, this study leverages a comparative analysis of non-clustered indexing overhead in terms of time and space taken by indexing process for varying volume data sets with increasing Index Hit Ratio. Furthermore, the experiments evaluate performance of search operations in terms of data access and retrieval time for queries that use indexes. We then validated the obtained results using Petri net mathematical modeling. We used multiple data sets in our experiments to manifest the impact of growing volume of data on indexing and data search and retrieval performance. The results and highlighted challenges favorably lead researchers towards improved implication of indexing mechanism in perspective of data retrieval from big data. Additionally, this study advocates selection of a non-clustered indexing solution so that optimized search performance over big data is obtained.
big data, indexing, big data processing, data retrieval
3259-3271,
Siddiqa, Aisha
4be8b0e4-9be1-4368-9c0c-00727a86bab1
Karim, Ahmad
2a648c9a-d709-4592-b02e-1141e2d6c0cc
Saba, Tanzila
55b30edd-0690-4061-a26b-172532fb0341
Chang, Victor
a7c75287-b649-4a63-a26c-6af6f26525a4
2017
Siddiqa, Aisha
4be8b0e4-9be1-4368-9c0c-00727a86bab1
Karim, Ahmad
2a648c9a-d709-4592-b02e-1141e2d6c0cc
Saba, Tanzila
55b30edd-0690-4061-a26b-172532fb0341
Chang, Victor
a7c75287-b649-4a63-a26c-6af6f26525a4
Siddiqa, Aisha, Karim, Ahmad, Saba, Tanzila and Chang, Victor
(2017)
On the analysis of big data indexing execution strategies.
Journal of Intelligent & Fuzzy Systems, 32 (5), .
(doi:10.3233/JIFS-169269).
Abstract
Efficient response to search queries is very crucial for data analysts to obtain timely results from big data spanned over heterogeneous machines. Currently, a number of big-data processing frameworks are available in which search operations are performed in distributed and parallel manner. However, implementation of indexing mechanism results in noticeable reduction of overall query processing time. There is an urge to assess the feasibility and impact of indexing towards query execution performance. This paper investigates the performance of state-of-the-art clustered indexing approaches over Hadoop framework which is de facto standard for big data processing. Moreover, this study leverages a comparative analysis of non-clustered indexing overhead in terms of time and space taken by indexing process for varying volume data sets with increasing Index Hit Ratio. Furthermore, the experiments evaluate performance of search operations in terms of data access and retrieval time for queries that use indexes. We then validated the obtained results using Petri net mathematical modeling. We used multiple data sets in our experiments to manifest the impact of growing volume of data on indexing and data search and retrieval performance. The results and highlighted challenges favorably lead researchers towards improved implication of indexing mechanism in perspective of data retrieval from big data. Additionally, this study advocates selection of a non-clustered indexing solution so that optimized search performance over big data is obtained.
Text
Analysis of Big Data Indexing Execution Strategies
- Accepted Manuscript
More information
Accepted/In Press date: 3 September 2016
e-pub ahead of print date: 24 April 2017
Published date: 2017
Keywords:
big data, indexing, big data processing, data retrieval
Organisations:
Electronic & Software Systems
Identifiers
Local EPrints ID: 399946
URI: http://eprints.soton.ac.uk/id/eprint/399946
ISSN: 1064-1246
PURE UUID: ad66c6eb-681c-4692-8705-148b82b13d5f
Catalogue record
Date deposited: 04 Sep 2016 11:08
Last modified: 15 Mar 2024 02:04
Export record
Altmetrics
Contributors
Author:
Aisha Siddiqa
Author:
Ahmad Karim
Author:
Tanzila Saba
Author:
Victor Chang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics