Ecoscape: fault tolerance benchmark for adaptive remediation strategies in real-time edge ML
Ecoscape: fault tolerance benchmark for adaptive remediation strategies in real-time edge ML
Edge computing offers significant advantages for real-time data processing tasks, such as object recognition, by reducing network latency and bandwidth usage. However, edge environments are susceptible to various types of fault. A remediator is an automated software component designed to adjust the configuration parameters of a software service dynamically. Its primary function is to maintain the service's operational state within predefined Service Level Objectives by applying corrective actions in response to deviations from these objectives. Remediators can be implemented based on the Kubernetes container orchestration tool by implementing remediation strategies such as rescheduling or adjusting application parameters. However, currently, there is no method to compare these remediation strategies fairly. This paper introduces Ecoscape, a comprehensive benchmark designed to evaluate the performance of remediation strategies in fault-prone environments. Using Chaos Engineering techniques, Ecoscape simulates realistic fault scenarios and provides a quantifiable score to assess the efficacy of different remediation approaches. In addition, it is configurable to support domainspecific Service Level Objectives. We demonstrate the capabilities of Ecoscape in edge machine learning inference, offering a clear framework to optimize fault tolerance in these systems without needing a physical edge testbed.
Autoscaling, Benchmark, Edge Computing, Fault Tolerance, Kubernetes, Machine Learning Inference, Real-Time, Remediation, Scheduling
275-280
Reiter, Hendrik
a357c35a-95af-4822-ada8-a1f0ba4a7f76
Hamid, Ahmad Rzgar
4d8c8ba0-02c6-4a87-af17-0995ab126a5b
Schlosser, Florian
ac09ea83-8581-4fb4-9e62-96dd9dc96bcf
Kjaergaard, Mikkel Baun
e21ad4bc-36f4-4af2-a4ae-9c44f1cd710c
Hasselbring, Wilhelm
ee89c5c9-a900-40b1-82c1-552268cd01bd
2025
Reiter, Hendrik
a357c35a-95af-4822-ada8-a1f0ba4a7f76
Hamid, Ahmad Rzgar
4d8c8ba0-02c6-4a87-af17-0995ab126a5b
Schlosser, Florian
ac09ea83-8581-4fb4-9e62-96dd9dc96bcf
Kjaergaard, Mikkel Baun
e21ad4bc-36f4-4af2-a4ae-9c44f1cd710c
Hasselbring, Wilhelm
ee89c5c9-a900-40b1-82c1-552268cd01bd
Reiter, Hendrik, Hamid, Ahmad Rzgar, Schlosser, Florian, Kjaergaard, Mikkel Baun and Hasselbring, Wilhelm
(2025)
Ecoscape: fault tolerance benchmark for adaptive remediation strategies in real-time edge ML.
Chang, Rong N., Chang, Carl K., Yang, Jingwei, Atukorala, Nimanthi, Chen, Dan, Helal, Sumi, Tarkoma, Sasu, He, Qiang, Kosar, Tevfik, Ardagna, Claudio, Feld, Sebastian, di Nitto, Elisabetta and Wimmer, Manuel
(eds.)
In Proceedings - 2025 IEEE International Conference on Quantum Software, QSW 2025.
IEEE.
.
(doi:10.1109/QSW67625.2025.00041).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Edge computing offers significant advantages for real-time data processing tasks, such as object recognition, by reducing network latency and bandwidth usage. However, edge environments are susceptible to various types of fault. A remediator is an automated software component designed to adjust the configuration parameters of a software service dynamically. Its primary function is to maintain the service's operational state within predefined Service Level Objectives by applying corrective actions in response to deviations from these objectives. Remediators can be implemented based on the Kubernetes container orchestration tool by implementing remediation strategies such as rescheduling or adjusting application parameters. However, currently, there is no method to compare these remediation strategies fairly. This paper introduces Ecoscape, a comprehensive benchmark designed to evaluate the performance of remediation strategies in fault-prone environments. Using Chaos Engineering techniques, Ecoscape simulates realistic fault scenarios and provides a quantifiable score to assess the efficacy of different remediation approaches. In addition, it is configurable to support domainspecific Service Level Objectives. We demonstrate the capabilities of Ecoscape in edge machine learning inference, offering a clear framework to optimize fault tolerance in these systems without needing a physical edge testbed.
This record has no associated files available for download.
More information
Published date: 2025
Additional Information:
Publisher Copyright:
© 2025 IEEE.
Venue - Dates:
2025 IEEE International Conference on Quantum Software, QSW 2025, , Helsinki, Finland, 2025-07-07 - 2025-07-12
Keywords:
Autoscaling, Benchmark, Edge Computing, Fault Tolerance, Kubernetes, Machine Learning Inference, Real-Time, Remediation, Scheduling
Identifiers
Local EPrints ID: 506638
URI: http://eprints.soton.ac.uk/id/eprint/506638
PURE UUID: 00898f96-a805-403b-9ba9-5fb0ccd66e92
Catalogue record
Date deposited: 12 Nov 2025 17:48
Last modified: 13 Nov 2025 03:10
Export record
Altmetrics
Contributors
Author:
Hendrik Reiter
Author:
Ahmad Rzgar Hamid
Author:
Florian Schlosser
Author:
Mikkel Baun Kjaergaard
Author:
Wilhelm Hasselbring
Editor:
Rong N. Chang
Editor:
Carl K. Chang
Editor:
Jingwei Yang
Editor:
Nimanthi Atukorala
Editor:
Dan Chen
Editor:
Sumi Helal
Editor:
Sasu Tarkoma
Editor:
Qiang He
Editor:
Tevfik Kosar
Editor:
Claudio Ardagna
Editor:
Sebastian Feld
Editor:
Elisabetta di Nitto
Editor:
Manuel Wimmer
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics