A two-phase time aggregation algorithm for average cost Markov decision processes
A two-phase time aggregation algorithm for average cost Markov decision processes
This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy evaluation in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then sequentially applied until convergence is attained or a prescribed running time is exceeded.
Dynamic Programming, Embedding, Markov Decision Processes, Stochastic Optimal Control, Time Aggregation
1615-1620
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, Marcelo D.
7f484139-de97-4458-aa6b-dc3249811a08
1 January 2012
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, Marcelo D.
7f484139-de97-4458-aa6b-dc3249811a08
Arruda, Edilson F. and Fragoso, Marcelo D.
(2012)
A two-phase time aggregation algorithm for average cost Markov decision processes.
In 2012 American Control Conference, ACC 2012.
IEEE.
.
(doi:10.1109/acc.2012.6315187).
Record type:
Conference or Workshop Item
(Paper)
Abstract
This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy evaluation in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then sequentially applied until convergence is attained or a prescribed running time is exceeded.
This record has no associated files available for download.
More information
Published date: 1 January 2012
Venue - Dates:
2012 American Control Conference, ACC 2012, , Montreal, QC, Canada, 2012-06-27 - 2012-06-29
Keywords:
Dynamic Programming, Embedding, Markov Decision Processes, Stochastic Optimal Control, Time Aggregation
Identifiers
Local EPrints ID: 445897
URI: http://eprints.soton.ac.uk/id/eprint/445897
ISSN: 0743-1619
PURE UUID: 77b3f861-b3d1-4d8e-a972-0d0b8bcf27e5
Catalogue record
Date deposited: 13 Jan 2021 17:31
Last modified: 17 Mar 2024 04:04
Export record
Altmetrics
Contributors
Author:
Edilson F. Arruda
Author:
Marcelo D. Fragoso
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics