The University of Southampton
University of Southampton Institutional Repository

A two-phase time aggregation algorithm for average cost Markov decision processes

A two-phase time aggregation algorithm for average cost Markov decision processes
A two-phase time aggregation algorithm for average cost Markov decision processes

This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy evaluation in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then sequentially applied until convergence is attained or a prescribed running time is exceeded.

Dynamic Programming, Embedding, Markov Decision Processes, Stochastic Optimal Control, Time Aggregation
0743-1619
1615-1620
Institute of Electrical and Electronics Engineers Inc.
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, Marcelo D.
7f484139-de97-4458-aa6b-dc3249811a08
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, Marcelo D.
7f484139-de97-4458-aa6b-dc3249811a08

Arruda, Edilson F. and Fragoso, Marcelo D. (2012) A two-phase time aggregation algorithm for average cost Markov decision processes. In 2012 American Control Conference, ACC 2012. Institute of Electrical and Electronics Engineers Inc. pp. 1615-1620 . (doi:10.1109/acc.2012.6315187).

Record type: Conference or Workshop Item (Paper)

Abstract

This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy evaluation in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then sequentially applied until convergence is attained or a prescribed running time is exceeded.

Full text not available from this repository.

More information

Published date: 1 January 2012
Venue - Dates: 2012 American Control Conference, ACC 2012, , Montreal, QC, Canada, 2012-06-26 - 2012-06-28
Keywords: Dynamic Programming, Embedding, Markov Decision Processes, Stochastic Optimal Control, Time Aggregation

Identifiers

Local EPrints ID: 445897
URI: http://eprints.soton.ac.uk/id/eprint/445897
ISSN: 0743-1619
PURE UUID: 77b3f861-b3d1-4d8e-a972-0d0b8bcf27e5
ORCID for Edilson F. Arruda: ORCID iD orcid.org/0000-0002-9835-352X

Catalogue record

Date deposited: 13 Jan 2021 17:31
Last modified: 18 Feb 2021 17:42

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×