The University of Southampton
University of Southampton Institutional Repository

Parameter optimal iterative learning control design from model-based, data-driven to reinforcement learning

Parameter optimal iterative learning control design from model-based, data-driven to reinforcement learning
Parameter optimal iterative learning control design from model-based, data-driven to reinforcement learning

Iterative learning control (ILC) is a high-performance control design method for systems operating in a repetitive fashion by learning from past experience. Our recent work shows that reinforcement learning (RL) shares many features with ILC and thus opens the door to new ILC algorithm designs. This paper continues the research by considering a parameter optimal iterative learning control (POILC) algorithm. It has a very simple structure and appealing convergence properties, but requires a model of the system. We first develop a data-driven POILC algorithm without using model information by performing an extra experiment on the plant. We then use a policy gradient RL algorithm to design a new model-free POILC algorithm. Both algorithms achieve the high-performance control target without using model information, but the convergence properties do differ. In particular, by increasing the number of function approximators in the latter, the RL-based model-free ILC can approach the performance of the model-based POILC. A numerical study is presented to compare the performance of different approaches and demonstrate the effectiveness of the proposed designs.

data-based control, Iterative learning control, reinforcement learning control
2405-8963
494-499
Zhang, Yueqing
8c1c1b35-d4bd-43d2-8871-b351bcbd2e7c
Chu, Bing
555a86a5-0198-4242-8525-3492349d4f0f
Shu, Zhan
ea5dc18c-d375-4db0-bbcc-dd0229f3a1cb
Zhang, Yueqing
8c1c1b35-d4bd-43d2-8871-b351bcbd2e7c
Chu, Bing
555a86a5-0198-4242-8525-3492349d4f0f
Shu, Zhan
ea5dc18c-d375-4db0-bbcc-dd0229f3a1cb

Zhang, Yueqing, Chu, Bing and Shu, Zhan (2022) Parameter optimal iterative learning control design from model-based, data-driven to reinforcement learning. IFAC-PapersOnLine, 55 (12), 494-499. (doi:10.1016/j.ifacol.2022.07.360).

Record type: Article

Abstract

Iterative learning control (ILC) is a high-performance control design method for systems operating in a repetitive fashion by learning from past experience. Our recent work shows that reinforcement learning (RL) shares many features with ILC and thus opens the door to new ILC algorithm designs. This paper continues the research by considering a parameter optimal iterative learning control (POILC) algorithm. It has a very simple structure and appealing convergence properties, but requires a model of the system. We first develop a data-driven POILC algorithm without using model information by performing an extra experiment on the plant. We then use a policy gradient RL algorithm to design a new model-free POILC algorithm. Both algorithms achieve the high-performance control target without using model information, but the convergence properties do differ. In particular, by increasing the number of function approximators in the latter, the RL-based model-free ILC can approach the performance of the model-based POILC. A numerical study is presented to compare the performance of different approaches and demonstrate the effectiveness of the proposed designs.

Text
1-s2.0-S2405896322007613-main - Version of Record
Download (503kB)

More information

e-pub ahead of print date: 4 August 2022
Published date: 2022
Additional Information: Funding Information: ∗ University YofuSoeqiuthngamptZhaon,ng SO∗ Bin17 g1BCJhSouu ∗ thamZhanpton,ShuSO∗∗17 1BJ, UK ∗∗ University YofuSeoquinthgamZphtaonng, S∗OB1i7ng1BCJhSuo∗utZhahmanptoSnh,uS∗∗O17 1BJ, UK ∗ ∗∗University of Alberta, Edmonton, T6G 2H5, Canada (e-mail: UnivUernsiivteyrsoiftyS(e(eoo-ma-ufmtAhaalil:iblm:erp{{ttyyaozz,n3n17,3E,nd1Sm7O,o1b.bn7.tchuco1hnBu,J}}T@@S6ssooton.oGuttoh2naH.mac.a5c,p.tuuCok)kan)n,..aSdOa1(7e-1mBaJi,l:UK ∗∗ zshu1@ualberta.ca) University of Alberta, Edmonton, T6G 2H5, Canada (e-mail: ∗∗University of Albertazs,hEud1m@ounatlboenr,taT.c6aG) 2H5, Canada (e-mail: Abstract: Iterative learning controlzsh(ILC)u1@uaislabehrtiagh.c-pa)erformance control design method for Abstract: Iterative learning control (ILC) is a high-performance control design method for Asybstsetmrascotp: eIrtaetriantgivien leaarrenpinegtitciovnetfraoslh(iIoLnCb)yislearnhiinghg-pfreormforpmaasntceexpcoenriternocled. eOsiugrn rmeceetnhtodwofrokr systems operating in a repetitive fashion by learning from past experience. Our recent work parameter optimal iterative learning control (POILC) algorithm. It has a very simple structure door to new ILC algorithm designs. This paper continues the research by considering a and appealing convergence properties, but requires a model of the system. We first develop parameter optimal iterative learning control (POILC) algorithm. It has a very simple structure a data-driven POILC algorithm without using model information by performing an extra and appealing convergence properties, but requires a model of the system. We first develop free POILC algorithm. Both algorithms achieve the high-performance control target without experiment on the plant. We then use a policy gradient RL algorithm to design a new model-using model information, but the convergence properties do differ. In particular, by increasing free POILC algorithm. Both algorithms achieve the high-performance control target without the number of function approximators in the latter, the RL-based model-free ILC can approach using model information, but the convergence properties do differ. In particular, by increasing performance of different approaches and demonstrate the effectiveness of the proposed designs. the performance of the model-based POILC. A numerical study is presented to compare the pCopyright ©erformance2022 Thofdiffere Auentthors. Thapproacis is anhesanopden access article undemonstratetheder theffecte CC BY-NC-ND liceniveness of the proposedse designs. theerfopremrfaonrmceaonfcediofffertehnet mapopdreol-abcahseesdanPdOdILemC.onAstnrautme etrhiecaelffsetcutdivyenisespsroefsetnhteedprtoopocsoemdpdaerseigtnhse. K(hettypwso:/r/cdrse:aItitveercaotimvemloenasr.noirng/glicceonnsterso/lb,yr-enicn-fnodrc/4e.m0/e)nt learning control, data-based control. Keywords: Iterative learning control, reinforcement learning control, data-based control. Keywords: Iterative learning control, reinforcement learning control, data-based control. 1. INTRODUCTION or implicitly identifying or adapting system (controller) Keywo1.rdsI:NITteRraOtDivUCTIOe learningN control, reinforcemenotrleimarningplicitlycoidentrnotl,ifdayintag-baorseaddacptingontrol.system (controller) parameters from the data or performing extra experiments 1. INTRODUCTION or implicitly identifying or adapting system (controller) High-performance control systems working in a repetitive on the plant (Janssens et al., 2012; Bolder et al., 2018). High-performance1c.onINtrTolRsOysDtUemCsTwIOorNkinginarepetitive paonrarariatmhmmepeeltetpicelirsratsnlyfroftro(idmmJeanthethtsisefeydadninastatgaetoorralppa.,eedrfor2af0po1trmrim2n;ginginBgsoyleedsxtraxteterrmaeeet(xpxcapolee.n,rimrti2rmo0lee1lnne8ttr)ss). manner play a key role in a wide range of areas, e.g., For more detailed review of ILC techniques, please refer to High-performance control systems working in a repetitive on the plant (Janssens et al., 2012; Bolder et al., 2018). lems, conventional control methods would have difficulties out that, model-free ILC algorithms, which can achieve manufacturing, health care, and robotics. For such prob-Bristow et al. (2006); Owens (2015). It is worth pointing in achieving the high performance control targets, as they the high tracking performance without using model infor-lems, conventional control methods would have difficulties out that, model-free ILC algorithms, which can achieve normally require a highly accurate model which can be mation, tend to converge slower than model-based ILC in achieving the high performance control targets, as they the high tracking performance without using model infor-expensive or even impossible to obtain. design. A new model-free ILC algorithm with comparable normally require a highly accurate model which can be mation, tend to converge slower than model-based ILC convergence as model-based algorithms has been waiting. eTxopesnoslvivee tohreevaebnovime pporsosbiblelemt,oiotebrtaatiinv.e learning control design. A new model-free ILC algorithm with comparable (ILC) is proposed that enables high-performance track-coencveenrtglye,ncreeinasfomrcoedmeel-nbtalseeadrnailnggor(itRhLm)shhaassrbeceeeinvewdaimtiuncgh. TingT(IooLCssw)ooithollvviseeputhetrhtoepaonaasbbeaoodcvvceeturahpropatrteoeblebnmleammobdel,,esl.iteitheIrariLgaChttii-vvpeeuperlelfdaeoaarterningrmnsainnthegceccootinputrnnattcrrkoo-ll researecencthly,inreiternfesortce(mBerenttselearkasn,in2019)g(RL.)RLhaslearreceinsvtehdembuestch signal based on previous data (including input and error policy (control) from continuous or repeated interactions ing without an accurate model. ILC updates the input research interest (Bertsekas, 2019). RL learns the best information) to meet the stringent requirements regarding with the environment to maximise a performance index signal based on previous data (including input and error policy (control) from continuous or repeated interactions the control accuracy, and thus avoids the use of accurate (called return). In our earlier work (Zhang et al., 2019b), information) to meet the stringent requirements regarding with the environment to maximise a performance index efficient in diverse high-performance applications, includ-can be used to solve ILC problems. We show via simulation model information. This learning mechanism makes ILC we show that RL shares many similarities with ILC and ing robotics (Norrlöf, 2002), jet print (Park et al., 2007), that RL-based ILC designs can achieve the high perfor-efficient in diverse high-performance applications, includ-can be used to solve ILC problems. We show via simulation etc. A number of ILC methods have been proposed and can mance tracking requirement but tend to converge slower ing robotics (Norrlöf, 2002), jet print (Park et al., 2007), that RL-based ILC designs can achieve the high perfor-uses explicit system dynamics to design the input updating ILC algorithm design using advanced RL designs. Another be divided into two categories. Model-based ILC design than model-based ILC, opening new possibilities for novel law, examples of which include gradient-based ILC (Owens work (Poot et al., 2020) proposes an actor-critic ILC uses explicit system dynamics to design the input updating ILC algorithm design using advanced RL designs. Another et al., 2009), inverse-based ILC (Harte et al., 2005), norm method and compares it with a basis function approach in law, examples of which include gradient-based ILC (Owens work (Poot et al., 2020) proposes an actor-critic ILC optimal ILC (Amann et al., 1996), parameter optimal ILC the norm optimal framework. It shows that it is capable et al., 2009), inverse-based ILC (Harte et al., 2005), norm method and compares it with a basis function approach in free design (or data-driven design) directly updates the explicit model information. More recently, we develop a (Owens and Feng, 2003), etc. On the other hand, model-of achieving the same feed-forward signal without using input without using model information, by either explicitly Q-learning based norm optimal ILC design (Zhang et al., free design (or data-driven design) directly updates the explicit model information. More recently, we develop a ⋆ This work was partially supported by the ZZU-Southampton 2022). We can show the convergence rigorously. However, input without using model information, by either explicitly Q-learning based norm optimal ILC design (Zhang et al., ⋆This work was partially supported by the ZZU-Southampton 2022).Wecanshowtheconvergencerigorously.However, ⋆input without using model information, by either explicitly it-hleaasranirneglabtiavseeldy ncoomrmpleoxptsitmruacltIuLrCe, danesdignnee(dZshaansgubesttaanl.-, Council (CSC). it has a relatively complex structure, and needs a substan- CollaborativeResearchProject16306/01andtheChinaScholarship ti0a2l2n)u.mWbeercaonfistheorwatitohnesctooncvoenrgveenrgcee.rigorously.However, This work was partially supported by the ZZU-Southampton tialnumberofiterationstoconverge. Council (CSC). ttiahlansuamrbeelartoivfeiltyercaotmiopnlsextostcrouncvtuerrgee,.and needs a substan-Council2405-8963 Copyright (CSC).©2022 The Authors. This is an open access article under the CC BY-NC-ND licenseCollaborativeResearchProject16306/01andtheChinaScholarship tial number of iterations to. converge. Peer review under responsibility of International Federation of Automatic Control. 10.1016/j.ifacol.2022.07.360 Publisher Copyright: © 2022 Elsevier B.V.. All rights reserved.
Venue - Dates: 14th IFAC Workshop on Adaptive and Learning Control Systems, ALCOS 2022, , Casablanca, Morocco, 2022-06-29 - 2022-07-01
Keywords: data-based control, Iterative learning control, reinforcement learning control

Identifiers

Local EPrints ID: 471605
URI: http://eprints.soton.ac.uk/id/eprint/471605
ISSN: 2405-8963
PURE UUID: 22cb7a7a-8e7c-4e16-9891-be49367b3dd7
ORCID for Yueqing Zhang: ORCID iD orcid.org/0000-0003-2304-6151
ORCID for Bing Chu: ORCID iD orcid.org/0000-0002-2711-8717
ORCID for Zhan Shu: ORCID iD orcid.org/0000-0002-5933-254X

Catalogue record

Date deposited: 14 Nov 2022 18:09
Last modified: 18 Mar 2024 03:21

Export record

Altmetrics

Contributors

Author: Yueqing Zhang ORCID iD
Author: Bing Chu ORCID iD
Author: Zhan Shu ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×