# Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Oliehoek, Frans, Tang, Shi Yuan and Zhang, Jie (2021) Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork. Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei. 01 - 05 May 2011. pp. 1296-1304 . .

## Abstract

Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.

Published date: 4 May 2021
Venue - Dates: Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011), , Taipei, 2011-05-01 - 2011-05-05
Keywords: Cross-Entropy Method, Generative Adversarial Networks, Hypernetworks, Reinforcement Learning

