Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork
Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork
Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.
Cross-Entropy Method, Generative Adversarial Networks, Hypernetworks, Reinforcement Learning
1296-1304
Oliehoek, Frans
73e15fe1-2398-455d-98a7-af885428dddc
Tang, Shi Yuan
7be09b47-3405-4b51-8971-e29a62e1bc8c
Zhang, Jie
6bad4e75-40e0-4ea3-866d-58c8018b225a
4 May 2021
Oliehoek, Frans
73e15fe1-2398-455d-98a7-af885428dddc
Tang, Shi Yuan
7be09b47-3405-4b51-8971-e29a62e1bc8c
Zhang, Jie
6bad4e75-40e0-4ea3-866d-58c8018b225a
Oliehoek, Frans, Tang, Shi Yuan and Zhang, Jie
(2021)
Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork.
Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei.
02 - 06 May 2011.
.
(doi:10.48448/ckqa-am79).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.
This record has no associated files available for download.
More information
Published date: 4 May 2021
Additional Information:
Funding Information:
SY.T. acknowledges support from the Alibaba Group and the Alibaba-NTU Singapore Joint Research Institute. F.A.O. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 re- search and innovation programme (grant agree- ment No. 758824 —INFLUENCE).
Funding Information:
SY.T. acknowledges support from the Alibaba Group and the Alibaba-NTU Singapore Joint Research Institute. F.A.O. received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 758824 -INFLUENCE).
Publisher Copyright:
© 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
Venue - Dates:
Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei, 2011-05-02 - 2011-05-06
Keywords:
Cross-Entropy Method, Generative Adversarial Networks, Hypernetworks, Reinforcement Learning
Identifiers
Local EPrints ID: 451466
URI: http://eprints.soton.ac.uk/id/eprint/451466
PURE UUID: 9cd5a9d1-5be1-4d98-8a39-c19962f50bd2
Catalogue record
Date deposited: 29 Sep 2021 19:06
Last modified: 16 Mar 2024 14:11
Export record
Altmetrics
Contributors
Author:
Frans Oliehoek
Author:
Shi Yuan Tang
Author:
Jie Zhang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics