Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.

Cross-Entropy Method, Generative Adversarial Networks, Hypernetworks, Reinforcement Learning

10.48448/ckqa-am79

1296-1304

Oliehoek, Frans

73e15fe1-2398-455d-98a7-af885428dddc

Tang, Shi Yuan

7be09b47-3405-4b51-8971-e29a62e1bc8c

Zhang, Jie

6bad4e75-40e0-4ea3-866d-58c8018b225a

4 May 2021

Oliehoek, Frans

73e15fe1-2398-455d-98a7-af885428dddc

Tang, Shi Yuan

7be09b47-3405-4b51-8971-e29a62e1bc8c

Zhang, Jie

6bad4e75-40e0-4ea3-866d-58c8018b225a

Oliehoek, Frans, Tang, Shi Yuan and Zhang, Jie (2021) Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork. Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei. 02 - 06 May 2011. pp. 1296-1304 . (doi:10.48448/ckqa-am79).

Record type: Conference or Workshop Item (Paper)

Abstract

This record has no associated files available for download.

More information

Published date: 4 May 2021

Additional Information: Funding Information: SY.T. acknowledges support from the Alibaba Group and the Alibaba-NTU Singapore Joint Research Institute. F.A.O. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 re- search and innovation programme (grant agree- ment No. 758824 —INFLUENCE). Funding Information: SY.T. acknowledges support from the Alibaba Group and the Alibaba-NTU Singapore Joint Research Institute. F.A.O. received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 758824 -INFLUENCE). Publisher Copyright: © 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

Venue - Dates: Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei, 2011-05-02 - 2011-05-06

Keywords: Cross-Entropy Method, Generative Adversarial Networks, Hypernetworks, Reinforcement Learning

Learn more about Agents, Interactions and Complexity research