The University of Southampton
University of Southampton Institutional Repository

High frequency batch-oriented computations over large sliding time windows

High frequency batch-oriented computations over large sliding time windows
High frequency batch-oriented computations over large sliding time windows
Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low.

In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.
0167-739X
1-11
Aniello, Leonardo
9846e2e4-1303-4b8b-9092-5d8e9bb514c3
Querzoni, Leonardo
c0eee656-74e7-419d-876c-3cad808683d6
Baldoni, Roberto
6ea5e1cc-92fe-4b9d-9ed3-0b7970553965
Aniello, Leonardo
9846e2e4-1303-4b8b-9092-5d8e9bb514c3
Querzoni, Leonardo
c0eee656-74e7-419d-876c-3cad808683d6
Baldoni, Roberto
6ea5e1cc-92fe-4b9d-9ed3-0b7970553965

Aniello, Leonardo, Querzoni, Leonardo and Baldoni, Roberto (2015) High frequency batch-oriented computations over large sliding time windows. Future Generation Computer Systems, 43-44, 1-11. (doi:10.1016/j.future.2014.09.008).

Record type: Article

Abstract

Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low.

In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.

Text
rolling - Author's Original
Download (907kB)

More information

Accepted/In Press date: 19 September 2014
e-pub ahead of print date: 10 October 2014
Published date: February 2015

Identifiers

Local EPrints ID: 431302
URI: http://eprints.soton.ac.uk/id/eprint/431302
ISSN: 0167-739X
PURE UUID: 04d0c621-db68-4bee-8637-aba815d72bef
ORCID for Leonardo Aniello: ORCID iD orcid.org/0000-0003-2886-8445

Catalogue record

Date deposited: 29 May 2019 16:30
Last modified: 16 Mar 2024 04:32

Export record

Altmetrics

Contributors

Author: Leonardo Aniello ORCID iD
Author: Leonardo Querzoni
Author: Roberto Baldoni

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×