Tom Schaul - Natural Evolution Strategies

Natural Evolution Strategies

Natural Evolution Strategies (NES[3][9][10][17][19][21][27]) are well-grounded, evolution-strategy inspired black-box optimization algorithms, which instead of maintaining a population of search points, iteratively update a search distribution.

General procedure

The parameterized search distribution is used to produce a batch of search points, and the fitness function is evaluated at each such point. The distribution parameters (which include strategy parameters) allow the algorithm to adaptively capture the (local) structure of the fitness function. For example, in the case of a Gaussian distribution, this comprises the mean and the covariance matrix. From the samples, NES estimates a search gradient on the parameters towards higher expected fitness. NES then performs a gradient ascent step along the natural gradient, a second-order method which, unlike the plain gradient, renormalizes the update with respect to uncertainty. This step is crucial, since it prevents oscillations, premature convergence, and undesired effects stemming from a given parameterization. The entire process reiterates until a stopping criterion is met.

All members of the NES family operate based on the same principles. They differ in the type of distribution and the gradient approximation method used. Different search spaces require different search distributions; for example, in low dimensionality it can be highly beneficial to model the full covariance matrix (xNES). In high dimensions a more scalable alternative is to limit the covariance to the diagonal only (SNES). In addition, highly multi-modal search spaces may benefit from more heavy-tailed distributions. A last distinction arises between distributions where we can analytically compute the natural gradient, and more general distributions where we need to estimate it from samples.

Recommendations and implementations

	xNES	SNES
When to use?	Small-to-medium problem dimension (<100), especially powerful with highly correlated parameters.	Medium-to-large problem dimension, with approximately separable parameters.
Benchmarking results	BBOB 2012 [Pdf]	BBOB 2012 [Pdf]
Pseudo-code	[Pdf]	[Pdf]
Python	xnes.py by Tom Schaul (also xnes.py in PyBrain)	snes.py by Tom Schaul (also snes.py in PyBrain)
Matlab	xnes.m by Sun Yi	snes.m by Matt Luciw
Mathematica	nes.nb by Jan Koutník	nes.nb by Jan Koutník, or snes.nb by Giuseppe Cuccu
C++	xnes.cpp by Tobias Glasmachers
Lua		In the torch library (Clément Farabet)

Additional variants

Cauchy-NES ([Pseudocode]), using a multivariate Cauchy distribution that can improve the search for deceptive and multimodal functions.
Radial NES (rNES, [Pseudocode]; [Mathematica] by Giuseppe Cuccu), the simplest variant that adapts a single parameter. Lends itself to theoretical studies such as [27].
xNES with adaptation sampling (xNES-as, [Pseudocode]), a variant that adapts the learning rate online, which often improves performance (see this comparison).
(1+1)-NES ( [Pseudocode]), a hill-climber variant.
Multi-objective extension (MO-NES, [Pseudocode]) for multi-objective optimization.
xNES with a rank-one covariance matrix approximation (R1-NES, [Pseudocode]; [Matlab] by Sun Yi).
Distance-weighted xNES, with some substantial speed improvements (DX-NES, [Matlab] by Nobusumi Fukushima), for details see N. Fukushima, Y. Nagata, S. Kobayashi and I. Ono, presented at CEC 2011.

If not mentioned otherwise, all this code is open-source, free to use and licensed under the BSD software license.