Novel time aggregation based algorithms for Markov decision processes
Novel time aggregation based algorithms for Markov decision processes
We present two novel approaches to accelerate the convergence of a class of Markov decision problems (MDPs) with stationary state space components, which iterate on reduced state and action spaces and converge to an optimal policy. The time aggregation based algorithm (TABA) partitions the state space into disjoint sets, one for each possible combination of the non-stationary components, and iterates in the sets. The local search policy set iteration (LSPSI) algorithm introduces a novel modified policy evaluation procedure that seamlessly performs a local search over a very large set of candidate policies, by sampling a reduced subset of actions at each state's value function update. Both approaches, as well as a combination of them, are validated by means of a mining supply chain application with a large number of stationary state components and a large set of feasible actions. The experiments suggest that the proposed frameworks are very efficient for such a class of problems.
Markov decision processes, Stochastic systems, Supply Chains, Optimization under uncertainty
Leite, João Marcelo L.G.
dde72342-2292-4ef3-a5ba-77ee16722d67
Marujo, Lino G.
6179c7ff-0187-40b6-b04a-e2cff32580f5
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Leite, João Marcelo L.G.
dde72342-2292-4ef3-a5ba-77ee16722d67
Marujo, Lino G.
6179c7ff-0187-40b6-b04a-e2cff32580f5
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Leite, João Marcelo L.G., Marujo, Lino G. and Arruda, Edilson F.
(2025)
Novel time aggregation based algorithms for Markov decision processes.
IEEE Transactions on Automatic Control.
(doi:10.1109/TAC.2025.3583262).
Abstract
We present two novel approaches to accelerate the convergence of a class of Markov decision problems (MDPs) with stationary state space components, which iterate on reduced state and action spaces and converge to an optimal policy. The time aggregation based algorithm (TABA) partitions the state space into disjoint sets, one for each possible combination of the non-stationary components, and iterates in the sets. The local search policy set iteration (LSPSI) algorithm introduces a novel modified policy evaluation procedure that seamlessly performs a local search over a very large set of candidate policies, by sampling a reduced subset of actions at each state's value function update. Both approaches, as well as a combination of them, are validated by means of a mining supply chain application with a large number of stationary state components and a large set of feasible actions. The experiments suggest that the proposed frameworks are very efficient for such a class of problems.
Text
Final_Version_Sent_IEEE_TAC
- Accepted Manuscript
More information
e-pub ahead of print date: 25 June 2025
Keywords:
Markov decision processes, Stochastic systems, Supply Chains, Optimization under uncertainty
Identifiers
Local EPrints ID: 504062
URI: http://eprints.soton.ac.uk/id/eprint/504062
ISSN: 0018-9286
PURE UUID: 3b214e37-02ec-4847-bda4-f8f89e496674
Catalogue record
Date deposited: 22 Aug 2025 16:33
Last modified: 23 Aug 2025 02:19
Export record
Altmetrics
Contributors
Author:
João Marcelo L.G. Leite
Author:
Lino G. Marujo
Author:
Edilson F. Arruda
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics