The University of Southampton
University of Southampton Institutional Repository

Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork
Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork
Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.
Cross-Entropy Method, Generative Adversarial Networks, Hypernetworks, Reinforcement Learning
1296-1304
Oliehoek, Frans
73e15fe1-2398-455d-98a7-af885428dddc
Tang, Shi Yuan
7be09b47-3405-4b51-8971-e29a62e1bc8c
Zhang, Jie
6bad4e75-40e0-4ea3-866d-58c8018b225a
Oliehoek, Frans
73e15fe1-2398-455d-98a7-af885428dddc
Tang, Shi Yuan
7be09b47-3405-4b51-8971-e29a62e1bc8c
Zhang, Jie
6bad4e75-40e0-4ea3-866d-58c8018b225a

Oliehoek, Frans, Tang, Shi Yuan and Zhang, Jie (2021) Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork. Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei. 02 - 06 May 2011. pp. 1296-1304 . (doi:10.48448/ckqa-am79).

Record type: Conference or Workshop Item (Paper)

Abstract

Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.

This record has no associated files available for download.

More information

Published date: 4 May 2021
Additional Information: Funding Information: SY.T. acknowledges support from the Alibaba Group and the Alibaba-NTU Singapore Joint Research Institute. F.A.O. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 re- search and innovation programme (grant agree- ment No. 758824 —INFLUENCE). Funding Information: SY.T. acknowledges support from the Alibaba Group and the Alibaba-NTU Singapore Joint Research Institute. F.A.O. received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 758824 -INFLUENCE). Publisher Copyright: © 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
Venue - Dates: Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei, 2011-05-02 - 2011-05-06
Keywords: Cross-Entropy Method, Generative Adversarial Networks, Hypernetworks, Reinforcement Learning

Identifiers

Local EPrints ID: 451466
URI: http://eprints.soton.ac.uk/id/eprint/451466
PURE UUID: 9cd5a9d1-5be1-4d98-8a39-c19962f50bd2

Catalogue record

Date deposited: 29 Sep 2021 19:06
Last modified: 16 Mar 2024 14:11

Export record

Altmetrics

Contributors

Author: Frans Oliehoek
Author: Shi Yuan Tang
Author: Jie Zhang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×