Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy optimization in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then alternated until convergence is attained. Some numerical experiments illustrate the results.
Dynamic programming, Embedding, Markov decision processes, Stochastic optimal control, Time aggregation
697-705
Arruda, E. F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, M. D.
7f484139-de97-4458-aa6b-dc3249811a08
1 February 2015
Arruda, E. F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, M. D.
7f484139-de97-4458-aa6b-dc3249811a08
Arruda, E. F. and Fragoso, M. D.
(2015)
Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm.
European Journal of Operational Research, 240 (3), .
(doi:10.1016/j.ejor.2014.08.023).
Abstract
This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy optimization in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then alternated until convergence is attained. Some numerical experiments illustrate the results.
This record has no associated files available for download.
More information
Published date: 1 February 2015
Keywords:
Dynamic programming, Embedding, Markov decision processes, Stochastic optimal control, Time aggregation
Identifiers
Local EPrints ID: 446040
URI: http://eprints.soton.ac.uk/id/eprint/446040
ISSN: 0377-2217
PURE UUID: 1cf1528f-040e-4ac4-bc43-b831062ae521
Catalogue record
Date deposited: 19 Jan 2021 17:33
Last modified: 06 Jun 2024 02:09
Export record
Altmetrics
Contributors
Author:
E. F. Arruda
Author:
M. D. Fragoso
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics