More information
e-pub ahead of print date: 4 August 2022
Published date: 2022
Additional Information:
Funding Information:
∗ University YofuSoeqiuthngamptZhaon,ng SO∗ Bin17 g1BCJhSouu ∗ thamZhanpton,ShuSO∗∗17 1BJ, UK ∗∗ University YofuSeoquinthgamZphtaonng, S∗OB1i7ng1BCJhSuo∗utZhahmanptoSnh,uS∗∗O17 1BJ, UK ∗ ∗∗University of Alberta, Edmonton, T6G 2H5, Canada (e-mail: UnivUernsiivteyrsoiftyS(e(eoo-ma-ufmtAhaalil:iblm:erp{{ttyyaozz,n3n17,3E,nd1Sm7O,o1b.bn7.tchuco1hnBu,J}}T@@S6ssooton.oGuttoh2naH.mac.a5c,p.tuuCok)kan)n,..aSdOa1(7e-1mBaJi,l:UK ∗∗ zshu1@ualberta.ca) University of Alberta, Edmonton, T6G 2H5, Canada (e-mail: ∗∗University of Albertazs,hEud1m@ounatlboenr,taT.c6aG) 2H5, Canada (e-mail: Abstract: Iterative learning controlzsh(ILC)u1@uaislabehrtiagh.c-pa)erformance control design method for Abstract: Iterative learning control (ILC) is a high-performance control design method for Asybstsetmrascotp: eIrtaetriantgivien leaarrenpinegtitciovnetfraoslh(iIoLnCb)yislearnhiinghg-pfreormforpmaasntceexpcoenriternocled. eOsiugrn rmeceetnhtodwofrokr systems operating in a repetitive fashion by learning from past experience. Our recent work parameter optimal iterative learning control (POILC) algorithm. It has a very simple structure door to new ILC algorithm designs. This paper continues the research by considering a and appealing convergence properties, but requires a model of the system. We first develop parameter optimal iterative learning control (POILC) algorithm. It has a very simple structure a data-driven POILC algorithm without using model information by performing an extra and appealing convergence properties, but requires a model of the system. We first develop free POILC algorithm. Both algorithms achieve the high-performance control target without experiment on the plant. We then use a policy gradient RL algorithm to design a new model-using model information, but the convergence properties do differ. In particular, by increasing free POILC algorithm. Both algorithms achieve the high-performance control target without the number of function approximators in the latter, the RL-based model-free ILC can approach using model information, but the convergence properties do differ. In particular, by increasing performance of different approaches and demonstrate the effectiveness of the proposed designs. the performance of the model-based POILC. A numerical study is presented to compare the pCopyright ©erformance2022 Thofdiffere Auentthors. Thapproacis is anhesanopden access article undemonstratetheder theffecte CC BY-NC-ND liceniveness of the proposedse designs. theerfopremrfaonrmceaonfcediofffertehnet mapopdreol-abcahseesdanPdOdILemC.onAstnrautme etrhiecaelffsetcutdivyenisespsroefsetnhteedprtoopocsoemdpdaerseigtnhse. K(hettypwso:/r/cdrse:aItitveercaotimvemloenasr.noirng/glicceonnsterso/lb,yr-enicn-fnodrc/4e.m0/e)nt learning control, data-based control. Keywords: Iterative learning control, reinforcement learning control, data-based control. Keywords: Iterative learning control, reinforcement learning control, data-based control. 1. INTRODUCTION or implicitly identifying or adapting system (controller) Keywo1.rdsI:NITteRraOtDivUCTIOe learningN control, reinforcemenotrleimarningplicitlycoidentrnotl,ifdayintag-baorseaddacptingontrol.system (controller) parameters from the data or performing extra experiments 1. INTRODUCTION or implicitly identifying or adapting system (controller) High-performance control systems working in a repetitive on the plant (Janssens et al., 2012; Bolder et al., 2018). High-performance1c.onINtrTolRsOysDtUemCsTwIOorNkinginarepetitive paonrarariatmhmmepeeltetpicelirsratsnlyfroftro(idmmJeanthethtsisefeydadninastatgaetoorralppa.,eedrfor2af0po1trmrim2n;ginginBgsoyleedsxtraxteterrmaeeet(xpxcapolee.n,rimrti2rmo0lee1lnne8ttr)ss). manner play a key role in a wide range of areas, e.g., For more detailed review of ILC techniques, please refer to High-performance control systems working in a repetitive on the plant (Janssens et al., 2012; Bolder et al., 2018). lems, conventional control methods would have difficulties out that, model-free ILC algorithms, which can achieve manufacturing, health care, and robotics. For such prob-Bristow et al. (2006); Owens (2015). It is worth pointing in achieving the high performance control targets, as they the high tracking performance without using model infor-lems, conventional control methods would have difficulties out that, model-free ILC algorithms, which can achieve normally require a highly accurate model which can be mation, tend to converge slower than model-based ILC in achieving the high performance control targets, as they the high tracking performance without using model infor-expensive or even impossible to obtain. design. A new model-free ILC algorithm with comparable normally require a highly accurate model which can be mation, tend to converge slower than model-based ILC convergence as model-based algorithms has been waiting. eTxopesnoslvivee tohreevaebnovime pporsosbiblelemt,oiotebrtaatiinv.e learning control design. A new model-free ILC algorithm with comparable (ILC) is proposed that enables high-performance track-coencveenrtglye,ncreeinasfomrcoedmeel-nbtalseeadrnailnggor(itRhLm)shhaassrbeceeeinvewdaimtiuncgh. TingT(IooLCssw)ooithollvviseeputhetrhtoepaonaasbbeaoodcvvceeturahpropatrteoeblebnmleammobdel,,esl.iteitheIrariLgaChttii-vvpeeuperlelfdaeoaarterningrmnsainnthegceccootinputrnnattcrrkoo-ll researecencthly,inreiternfesortce(mBerenttselearkasn,in2019)g(RL.)RLhaslearreceinsvtehdembuestch signal based on previous data (including input and error policy (control) from continuous or repeated interactions ing without an accurate model. ILC updates the input research interest (Bertsekas, 2019). RL learns the best information) to meet the stringent requirements regarding with the environment to maximise a performance index signal based on previous data (including input and error policy (control) from continuous or repeated interactions the control accuracy, and thus avoids the use of accurate (called return). In our earlier work (Zhang et al., 2019b), information) to meet the stringent requirements regarding with the environment to maximise a performance index efficient in diverse high-performance applications, includ-can be used to solve ILC problems. We show via simulation model information. This learning mechanism makes ILC we show that RL shares many similarities with ILC and ing robotics (Norrlöf, 2002), jet print (Park et al., 2007), that RL-based ILC designs can achieve the high perfor-efficient in diverse high-performance applications, includ-can be used to solve ILC problems. We show via simulation etc. A number of ILC methods have been proposed and can mance tracking requirement but tend to converge slower ing robotics (Norrlöf, 2002), jet print (Park et al., 2007), that RL-based ILC designs can achieve the high perfor-uses explicit system dynamics to design the input updating ILC algorithm design using advanced RL designs. Another be divided into two categories. Model-based ILC design than model-based ILC, opening new possibilities for novel law, examples of which include gradient-based ILC (Owens work (Poot et al., 2020) proposes an actor-critic ILC uses explicit system dynamics to design the input updating ILC algorithm design using advanced RL designs. Another et al., 2009), inverse-based ILC (Harte et al., 2005), norm method and compares it with a basis function approach in law, examples of which include gradient-based ILC (Owens work (Poot et al., 2020) proposes an actor-critic ILC optimal ILC (Amann et al., 1996), parameter optimal ILC the norm optimal framework. It shows that it is capable et al., 2009), inverse-based ILC (Harte et al., 2005), norm method and compares it with a basis function approach in free design (or data-driven design) directly updates the explicit model information. More recently, we develop a (Owens and Feng, 2003), etc. On the other hand, model-of achieving the same feed-forward signal without using input without using model information, by either explicitly Q-learning based norm optimal ILC design (Zhang et al., free design (or data-driven design) directly updates the explicit model information. More recently, we develop a ⋆ This work was partially supported by the ZZU-Southampton 2022). We can show the convergence rigorously. However, input without using model information, by either explicitly Q-learning based norm optimal ILC design (Zhang et al., ⋆This work was partially supported by the ZZU-Southampton 2022).Wecanshowtheconvergencerigorously.However, ⋆input without using model information, by either explicitly it-hleaasranirneglabtiavseeldy ncoomrmpleoxptsitmruacltIuLrCe, danesdignnee(dZshaansgubesttaanl.-, Council (CSC). it has a relatively complex structure, and needs a substan- CollaborativeResearchProject16306/01andtheChinaScholarship ti0a2l2n)u.mWbeercaonfistheorwatitohnesctooncvoenrgveenrgcee.rigorously.However, This work was partially supported by the ZZU-Southampton tialnumberofiterationstoconverge. Council (CSC). ttiahlansuamrbeelartoivfeiltyercaotmiopnlsextostcrouncvtuerrgee,.and needs a substan-Council2405-8963 Copyright (CSC).©2022 The Authors. This is an open access article under the CC BY-NC-ND licenseCollaborativeResearchProject16306/01andtheChinaScholarship tial number of iterations to. converge. Peer review under responsibility of International Federation of Automatic Control. 10.1016/j.ifacol.2022.07.360
Publisher Copyright:
© 2022 Elsevier B.V.. All rights reserved.
Venue - Dates:
14th IFAC Workshop on Adaptive and Learning Control Systems, ALCOS 2022, , Casablanca, Morocco, 2022-06-29 - 2022-07-01
Keywords:
data-based control, Iterative learning control, reinforcement learning control
Identifiers
Local EPrints ID: 471605
URI: http://eprints.soton.ac.uk/id/eprint/471605
ISSN: 2405-8963
PURE UUID: 22cb7a7a-8e7c-4e16-9891-be49367b3dd7
Catalogue record
Date deposited: 14 Nov 2022 18:09
Last modified: 18 Mar 2024 03:21
Export record
Altmetrics