The University of Southampton
University of Southampton Institutional Repository

Novel time aggregation based algorithms for Markov decision processes

Novel time aggregation based algorithms for Markov decision processes
Novel time aggregation based algorithms for Markov decision processes
We present two novel approaches to accelerate the convergence of a class of Markov decision problems (MDPs) with stationary state space components, which iterate on reduced state and action spaces and converge to an optimal policy. The time aggregation based algorithm (TABA) partitions the state space into disjoint sets, one for each possible combination of the non-stationary components, and iterates in the sets. The local search policy set iteration (LSPSI) algorithm introduces a novel modified policy evaluation procedure that seamlessly performs a local search over a very large set of candidate policies, by sampling a reduced subset of actions at each state's value function update. Both approaches, as well as a combination of them, are validated by means of a mining supply chain application with a large number of stationary state components and a large set of feasible actions. The experiments suggest that the proposed frameworks are very efficient for such a class of problems.
Markov decision processes, Stochastic systems, Supply Chains, Optimization under uncertainty
0018-9286
Leite, João Marcelo L.G.
dde72342-2292-4ef3-a5ba-77ee16722d67
Marujo, Lino G.
6179c7ff-0187-40b6-b04a-e2cff32580f5
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Leite, João Marcelo L.G.
dde72342-2292-4ef3-a5ba-77ee16722d67
Marujo, Lino G.
6179c7ff-0187-40b6-b04a-e2cff32580f5
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911

Leite, João Marcelo L.G., Marujo, Lino G. and Arruda, Edilson F. (2025) Novel time aggregation based algorithms for Markov decision processes. IEEE Transactions on Automatic Control. (doi:10.1109/TAC.2025.3583262).

Record type: Article

Abstract

We present two novel approaches to accelerate the convergence of a class of Markov decision problems (MDPs) with stationary state space components, which iterate on reduced state and action spaces and converge to an optimal policy. The time aggregation based algorithm (TABA) partitions the state space into disjoint sets, one for each possible combination of the non-stationary components, and iterates in the sets. The local search policy set iteration (LSPSI) algorithm introduces a novel modified policy evaluation procedure that seamlessly performs a local search over a very large set of candidate policies, by sampling a reduced subset of actions at each state's value function update. Both approaches, as well as a combination of them, are validated by means of a mining supply chain application with a large number of stationary state components and a large set of feasible actions. The experiments suggest that the proposed frameworks are very efficient for such a class of problems.

Text
Final_Version_Sent_IEEE_TAC - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (843kB)

More information

e-pub ahead of print date: 25 June 2025
Keywords: Markov decision processes, Stochastic systems, Supply Chains, Optimization under uncertainty

Identifiers

Local EPrints ID: 504062
URI: http://eprints.soton.ac.uk/id/eprint/504062
ISSN: 0018-9286
PURE UUID: 3b214e37-02ec-4847-bda4-f8f89e496674
ORCID for Edilson F. Arruda: ORCID iD orcid.org/0000-0002-9835-352X

Catalogue record

Date deposited: 22 Aug 2025 16:33
Last modified: 23 Aug 2025 02:19

Export record

Altmetrics

Contributors

Author: João Marcelo L.G. Leite
Author: Lino G. Marujo
Author: Edilson F. Arruda ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×