Novel time aggregation based algorithms for Markov decision processes

We present two novel approaches to accelerate the convergence of a class of Markov decision problems (MDPs) with stationary state space components, which iterate on reduced state and action spaces and converge to an optimal policy. The time aggregation based algorithm (TABA) partitions the state space into disjoint sets, one for each possible combination of the non-stationary components, and iterates in the sets. The local search policy set iteration (LSPSI) algorithm introduces a novel modified policy evaluation procedure that seamlessly performs a local search over a very large set of candidate policies, by sampling a reduced subset of actions at each state's value function update. Both approaches, as well as a combination of them, are validated by means of a mining supply chain application with a large number of stationary state components and a large set of feasible actions. The experiments suggest that the proposed frameworks are very efficient for such a class of problems.

Markov decision processes, Stochastic systems, Supply Chains, Optimization under uncertainty

10.1109/TAC.2025.3583262

0018-9286

Leite, João Marcelo L.G.

dde72342-2292-4ef3-a5ba-77ee16722d67

Marujo, Lino G.

6179c7ff-0187-40b6-b04a-e2cff32580f5

Arruda, Edilson F.

8eb3bd83-e883-4bf3-bfbc-7887c5daa911

25 June 2025

Leite, João Marcelo L.G.

dde72342-2292-4ef3-a5ba-77ee16722d67

Marujo, Lino G.

6179c7ff-0187-40b6-b04a-e2cff32580f5

Arruda, Edilson F.

8eb3bd83-e883-4bf3-bfbc-7887c5daa911

Leite, João Marcelo L.G., Marujo, Lino G. and Arruda, Edilson F. (2025) Novel time aggregation based algorithms for Markov decision processes. IEEE Transactions on Automatic Control. (doi:10.1109/TAC.2025.3583262).

Record type: Article

Abstract

Text

Final_Version_Sent_IEEE_TAC - Accepted Manuscript

Available under License Creative Commons Attribution.

Download (843kB)

More information

e-pub ahead of print date: 25 June 2025

Published date: 25 June 2025

Keywords: Markov decision processes, Stochastic systems, Supply Chains, Optimization under uncertainty

Identifiers

Local EPrints ID: 504062

URI: http://eprints.soton.ac.uk/id/eprint/504062

DOI: doi:10.1109/TAC.2025.3583262

ISSN: 0018-9286

PURE UUID: 3b214e37-02ec-4847-bda4-f8f89e496674

ORCID for Edilson F. Arruda:

orcid.org/0000-0002-9835-352X

Catalogue record

Date deposited: 22 Aug 2025 16:33

Last modified: 18 Oct 2025 02:03

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: João Marcelo L.G. Leite

Author: Lino G. Marujo

Author: Edilson F. Arruda

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information