High frequency batch-oriented computations over large sliding time windows
High frequency batch-oriented computations over large sliding time windows
Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low.
In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.
1-11
Aniello, Leonardo
9846e2e4-1303-4b8b-9092-5d8e9bb514c3
Querzoni, Leonardo
c0eee656-74e7-419d-876c-3cad808683d6
Baldoni, Roberto
6ea5e1cc-92fe-4b9d-9ed3-0b7970553965
February 2015
Aniello, Leonardo
9846e2e4-1303-4b8b-9092-5d8e9bb514c3
Querzoni, Leonardo
c0eee656-74e7-419d-876c-3cad808683d6
Baldoni, Roberto
6ea5e1cc-92fe-4b9d-9ed3-0b7970553965
Aniello, Leonardo, Querzoni, Leonardo and Baldoni, Roberto
(2015)
High frequency batch-oriented computations over large sliding time windows.
Future Generation Computer Systems, 43-44, .
(doi:10.1016/j.future.2014.09.008).
Abstract
Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low.
In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.
Text
rolling
- Author's Original
More information
Accepted/In Press date: 19 September 2014
e-pub ahead of print date: 10 October 2014
Published date: February 2015
Identifiers
Local EPrints ID: 431302
URI: http://eprints.soton.ac.uk/id/eprint/431302
ISSN: 0167-739X
PURE UUID: 04d0c621-db68-4bee-8637-aba815d72bef
Catalogue record
Date deposited: 29 May 2019 16:30
Last modified: 16 Mar 2024 04:32
Export record
Altmetrics
Contributors
Author:
Leonardo Aniello
Author:
Leonardo Querzoni
Author:
Roberto Baldoni
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics