The University of Southampton
University of Southampton Institutional Repository

Toward an optimized value iteration algorithm for average cost Markov decision processes

Toward an optimized value iteration algorithm for average cost Markov decision processes
Toward an optimized value iteration algorithm for average cost Markov decision processes

In this paper we propose a technique to accelerate the convergence rate of the value iteration (VI) algorithm applied to discrete average cost Markov decision processes (MDP). The convergence rate is measured with respect to the total computational effort instead of the iteration counter. Such a rate definition makes it possible to compare different classes of algorithms, which employ distinct and possibly variable updating schemes. A partial information value iteration (PIVI) algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view toward saving computations at the early stages of the algorithm, when one is typically far from the optimal solution. The PIVI overall computational effort is compared with that of the classical VI algorithm for a broad set of parameters. The results suggest that a suitable choice of parameters can lead to significant computational savings in the process of finding the optimal solution for discrete MDP under the average cost criterion.

Average cost, Computational effort, Markov decision processes, Value iteration
0191-2216
930-934
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Ourique, Fabrício
c2b933e0-dd92-4260-83f2-c3982f4911e9
Almudevar, Anthony
f0998a97-a377-41a9-82d0-0c1de5f33688
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Ourique, Fabrício
c2b933e0-dd92-4260-83f2-c3982f4911e9
Almudevar, Anthony
f0998a97-a377-41a9-82d0-0c1de5f33688

Arruda, Edilson F., Ourique, Fabrício and Almudevar, Anthony (2010) Toward an optimized value iteration algorithm for average cost Markov decision processes. In 2010 49th IEEE Conference on Decision and Control, CDC 2010. pp. 930-934 . (doi:10.1109/CDC.2010.5717895).

Record type: Conference or Workshop Item (Paper)

Abstract

In this paper we propose a technique to accelerate the convergence rate of the value iteration (VI) algorithm applied to discrete average cost Markov decision processes (MDP). The convergence rate is measured with respect to the total computational effort instead of the iteration counter. Such a rate definition makes it possible to compare different classes of algorithms, which employ distinct and possibly variable updating schemes. A partial information value iteration (PIVI) algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view toward saving computations at the early stages of the algorithm, when one is typically far from the optimal solution. The PIVI overall computational effort is compared with that of the classical VI algorithm for a broad set of parameters. The results suggest that a suitable choice of parameters can lead to significant computational savings in the process of finding the optimal solution for discrete MDP under the average cost criterion.

Full text not available from this repository.

More information

Published date: 1 December 2010
Venue - Dates: 2010 49th IEEE Conference on Decision and Control, CDC 2010, , Atlanta, GA, United States, 2010-12-15 - 2010-12-17
Keywords: Average cost, Computational effort, Markov decision processes, Value iteration

Identifiers

Local EPrints ID: 445887
URI: http://eprints.soton.ac.uk/id/eprint/445887
ISSN: 0191-2216
PURE UUID: 8a6d433a-73b8-40fb-951f-72274f26002d
ORCID for Edilson F. Arruda: ORCID iD orcid.org/0000-0002-9835-352X

Catalogue record

Date deposited: 13 Jan 2021 17:31
Last modified: 18 Feb 2021 17:42

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×