The University of Southampton
University of Southampton Institutional Repository

Nirvana: A non-intrusive black-box monitoring framework for rack-level fault detection

Nirvana: A non-intrusive black-box monitoring framework for rack-level fault detection
Nirvana: A non-intrusive black-box monitoring framework for rack-level fault detection
Many organizations today still manage mid or large in-house data centers that require very expensive maintenance efforts, including fault detection. Common monitoring frameworks used to quickly detect faults are complex to deploy/maintain, expensive, and intrusive as they require the installation of probes on monitored hw/sw to collect raw data. Such intrusiveness can be problematic as it imposes installation/management overhead and may interfere with security/privacy policies. In this paper we introduce NIRVANA, a novel monitoring system for fault detection that works at rack-level and is (i) non-intrusive, i.e., it does not require the installation of software probes on the hosts to be monitored and (ii) black-box, i.e., agnostic with respect to monitored applications. At the core of our solution lies the observation that aggregated features that can be monitored at rack-level in a non-intrusive and black-box way, show predictable behaviors while the system works in both fault-free and faulty states, it is therefore possible to detect and identify faults by monitoring and analyzing any perturbations to these behaviors. An extensive experimental evaluation shows that non-intrusiveness does not significantly hamper the fault detection capabilities of the monitoring system, thus validating our approach.
IEEE
Ciccotelli, Claudio
da54f041-47a2-45ea-8947-f35d01d1d488
Aniello, Leonardo
9846e2e4-1303-4b8b-9092-5d8e9bb514c3
Lombardi, Federico
78e41297-64c9-4c1e-9515-8eb59334a795
Montanari, Luca
1cb51e2a-48d7-4bee-9761-3fdb242d8709
Querzoni, Leonardo
c0eee656-74e7-419d-876c-3cad808683d6
Baldoni, Roberto
6ea5e1cc-92fe-4b9d-9ed3-0b7970553965
Ciccotelli, Claudio
da54f041-47a2-45ea-8947-f35d01d1d488
Aniello, Leonardo
9846e2e4-1303-4b8b-9092-5d8e9bb514c3
Lombardi, Federico
78e41297-64c9-4c1e-9515-8eb59334a795
Montanari, Luca
1cb51e2a-48d7-4bee-9761-3fdb242d8709
Querzoni, Leonardo
c0eee656-74e7-419d-876c-3cad808683d6
Baldoni, Roberto
6ea5e1cc-92fe-4b9d-9ed3-0b7970553965

Ciccotelli, Claudio, Aniello, Leonardo, Lombardi, Federico, Montanari, Luca, Querzoni, Leonardo and Baldoni, Roberto (2015) Nirvana: A non-intrusive black-box monitoring framework for rack-level fault detection. In 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE.. (doi:10.1109/PRDC.2015.22).

Record type: Conference or Workshop Item (Paper)

Abstract

Many organizations today still manage mid or large in-house data centers that require very expensive maintenance efforts, including fault detection. Common monitoring frameworks used to quickly detect faults are complex to deploy/maintain, expensive, and intrusive as they require the installation of probes on monitored hw/sw to collect raw data. Such intrusiveness can be problematic as it imposes installation/management overhead and may interfere with security/privacy policies. In this paper we introduce NIRVANA, a novel monitoring system for fault detection that works at rack-level and is (i) non-intrusive, i.e., it does not require the installation of software probes on the hosts to be monitored and (ii) black-box, i.e., agnostic with respect to monitored applications. At the core of our solution lies the observation that aggregated features that can be monitored at rack-level in a non-intrusive and black-box way, show predictable behaviors while the system works in both fault-free and faulty states, it is therefore possible to detect and identify faults by monitoring and analyzing any perturbations to these behaviors. An extensive experimental evaluation shows that non-intrusiveness does not significantly hamper the fault detection capabilities of the monitoring system, thus validating our approach.

Text
NIRVANA: A Non-Intrusive Black-Box Monitoring Framework for Rack-level Fault Detection
Restricted to Repository staff only
Request a copy

More information

Published date: 2015

Identifiers

Local EPrints ID: 431130
URI: http://eprints.soton.ac.uk/id/eprint/431130
PURE UUID: df3297b1-2821-44e3-8a95-14e64ec10cbc
ORCID for Leonardo Aniello: ORCID iD orcid.org/0000-0003-2886-8445
ORCID for Federico Lombardi: ORCID iD orcid.org/0000-0001-6463-8722

Catalogue record

Date deposited: 24 May 2019 16:30
Last modified: 16 Mar 2024 04:32

Export record

Altmetrics

Contributors

Author: Claudio Ciccotelli
Author: Leonardo Aniello ORCID iD
Author: Federico Lombardi ORCID iD
Author: Luca Montanari
Author: Leonardo Querzoni
Author: Roberto Baldoni

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×