The University of Southampton
University of Southampton Institutional Repository

Approximate dynamic programming based on expansive projections

Approximate dynamic programming based on expansive projections
Approximate dynamic programming based on expansive projections

We present a general method to obtain convergent approximate value iteration algorithms with function approximation. The result is applicable to any arbitrary approximation architecture and generalizes existing results in the literature derived for particular approximation schemes. Additionally, we show how to obtain a convergent approximate mapping whose fixed point is the projection in the approximation space of a fixed point of the exact dynamic programming mapping with regards to a suitable subset norm. This result relies on evaluating the difference between successive iterates in the selected subset norm, which provides convergent procedures for any arbitrary approximation architecture.

0191-2216
5537-5542
IEEE
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Do Val, João B.R.
4139d2f5-1439-45d9-a77e-8e7e20ec98b8
Arruda, Edilson F.
8eb3bd83-e883-4bf3-bfbc-7887c5daa911
Do Val, João B.R.
4139d2f5-1439-45d9-a77e-8e7e20ec98b8

Arruda, Edilson F. and Do Val, João B.R. (2006) Approximate dynamic programming based on expansive projections. In Proceedings of the 45th IEEE Conference on Decision and Control 2006, CDC. IEEE. pp. 5537-5542 . (doi:10.1109/cdc.2006.376823).

Record type: Conference or Workshop Item (Paper)

Abstract

We present a general method to obtain convergent approximate value iteration algorithms with function approximation. The result is applicable to any arbitrary approximation architecture and generalizes existing results in the literature derived for particular approximation schemes. Additionally, we show how to obtain a convergent approximate mapping whose fixed point is the projection in the approximation space of a fixed point of the exact dynamic programming mapping with regards to a suitable subset norm. This result relies on evaluating the difference between successive iterates in the selected subset norm, which provides convergent procedures for any arbitrary approximation architecture.

This record has no associated files available for download.

More information

Published date: 1 January 2006
Venue - Dates: 45th IEEE Conference on Decision and Control 2006, CDC, , San Diego, CA, United States, 2006-12-13 - 2006-12-15

Identifiers

Local EPrints ID: 445711
URI: http://eprints.soton.ac.uk/id/eprint/445711
ISSN: 0191-2216
PURE UUID: e8bae917-baee-42fd-afd4-47745815e2b7
ORCID for Edilson F. Arruda: ORCID iD orcid.org/0000-0002-9835-352X

Catalogue record

Date deposited: 06 Jan 2021 17:41
Last modified: 16 Apr 2024 01:59

Export record

Altmetrics

Contributors

Author: Edilson F. Arruda ORCID iD
Author: João B.R. Do Val

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×