The University of Southampton
University of Southampton Institutional Repository

Approximate dynamic programming via direct search in the space of value function approximations

Approximate dynamic programming via direct search in the space of value function approximations
Approximate dynamic programming via direct search in the space of value function approximations

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting.

Convex optimization, Direct search methods, Dynamic programming, Markov decision processes
0377-2217
343-351
Arruda, E.F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, M.D.
7f484139-de97-4458-aa6b-dc3249811a08
Do Val, J.B.R.
4139d2f5-1439-45d9-a77e-8e7e20ec98b8
Arruda, E.F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Fragoso, M.D.
7f484139-de97-4458-aa6b-dc3249811a08
Do Val, J.B.R.
4139d2f5-1439-45d9-a77e-8e7e20ec98b8

Arruda, E.F., Fragoso, M.D. and Do Val, J.B.R. (2011) Approximate dynamic programming via direct search in the space of value function approximations. European Journal of Operational Research, 211 (2), 343-351. (doi:10.1016/j.ejor.2010.11.019).

Record type: Article

Abstract

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting.

Full text not available from this repository.

More information

Accepted/In Press date: 13 November 2010
e-pub ahead of print date: 13 January 2011
Published date: 1 June 2011
Keywords: Convex optimization, Direct search methods, Dynamic programming, Markov decision processes

Identifiers

Local EPrints ID: 444753
URI: http://eprints.soton.ac.uk/id/eprint/444753
ISSN: 0377-2217
PURE UUID: 3731aebb-6f72-4aa4-802e-cd03db44cc46
ORCID for E.F. Arruda: ORCID iD orcid.org/0000-0002-9835-352X

Catalogue record

Date deposited: 03 Nov 2020 17:32
Last modified: 18 Feb 2021 17:42

Export record

Altmetrics

Contributors

Author: E.F. Arruda ORCID iD
Author: M.D. Fragoso
Author: J.B.R. Do Val

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×