Intelligence is not Artificial

(These are excerpts from my book "Intelligence is not Artificial")

### The Genetic Algorithm Renaissance

The field of mathematical optimization got started in earnest with the invention of linear programming. Contributions came from the Russian economist Leonid Kantorovich in 1939, from Frank Hitchcock at MIT in 1941 and from the Dutch economist Tjalling Koopmans at the University of Chicago in 1942, but it was George Dantzig, then still a student from UC Berkeley working an office of the Air Force during World War II, who invented the "simplex" method in 1946, just about in time to make it an ideal application of the newly invented electronic computer.

An optimization problem consists in finding the minimum or maximum of an "objective" function in a multidimensional space, given some constraints, a typical problem in planning. If the objective function and the constraints are linear, the optimization can be done with linear programming methods like simplex. Linear programming methods cannot be used when the objective function or the constraints or both are nonlinear. For example, gradient descent is a method of nonlinear optimization.

Genetic algorithms (or, better, evolutionary algorithms) are nonlinear optimization methods inspired by Darwinian evolution: let loose a population of algorithms in a space of possible solutions (the "search space") to find the best solution to a given problem, i.e. to autonomously "learn" how to solve a problem over consecutive generations using the Darwinian concepts of mutation, crossover and selection (the "survival of the fittest" process).

There is a long story of "black box" function optimization, starting with the Metropolis algorithm, devised in 1953 by Marshall Rosenbluth at the Los Alamos laboratories, and the "downhill simplex method" devised in 1965 by British statisticians John Nelder and Roger Mead ("A Simplex Method for Function Minimization", 1965).

Evolution Strategies (ES) for the optimization of nonlinear functions were introduced by Hans-Paul Schwefel ("Two-phase Nozzle and Hollow Core Jet Experiments," 1970) and Ingo Rechenberg at the Technical University of Berlin (his doctoral dissertation "Evolution Strategies", 1971).

At the same time that John Hopfield was finding a similarity between neural networks and thermodynamics, mathematicians found a connection between thermodynamics and function optimization (when the function is nonlinear). Scott Kirkpatrick, Dan Gelatt and Mario Vecchi at IBM labs in New York were looking for a method to optimize the process of electronic chip design and, resurrecting Rosenbluth's ideas, came up with simulated annealing ("Optimization by Simulated Annealing", 1983).

Juergen Schmidhuber employed genetic algorithms for meta-learning, i.e. learning how to learn, in his dissertation at the Technical University of Munich ("Evolutionary Principles in Self-referential Learning", 1987).

His thesis also pioneered the idea of meta-learning, of training a meta-learner so that it optimizes the so-called hyper-parameters of the learner, i.e. the parameters that determine how the learner learns. The hyper-parameters of a neural network or of an evolutionary system are usually decided by the engineer based on experience and intuition.

Other significant innovations in genetic algorithms were Darrell Whitley's "Genitor" algorithm at Colorado Stave University in 1988, in which the offspring replaces not the parents but rather the least fit member of the population; the cellular genetic algorithms of Bernard Manderick and Piet Spiessens of the Vrije Universiteit in Belgium ("Fine Grained Parallel Genetic Algorithms", 1989); David Goldberg's "tournament selection" at the University of Alabama ("A Note on Boltzmann tournament Selection", 1990); Larry Eshelman's CHC, a variant of ES, at Philips Laboratories in New York State (1991); and Inman Harvey’s SAGA ("Species Adaptation Genetic Algorithm") algorithm at University of Sussex (1992).

In the 1990s new evolution-based methods of optimization were born, such as "differential evolution", invented by Ken Price ("Differential Evolution", 1996) at UC Berkeley in order to solve an exotic problem (known as the "Chebychev Polynomial Fitting Problem") that had been posed to him by Rainer Storn; and "gene pool recombination", invented by Heinz Muehlenbein and Gerhard Paass in Germany ("Gene Pool Recombination in Genetic Algorithms", 1996). Notably, Andreas Ostermeier and Nikolaus Hansen at the Technical University of Berlin continued to refine Rechenberg's and Schwefel's evolution strategies and eventually achieved what is now the dominating algorithm of black-box optimization, the "Covariance Matrix Adaptation Evolution Strategy" or CMA-ES ("A Derandomized Approach to Self Adaptation of Evolution Strategies", 1994).

A framework for black-box optimization was developed at the Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA) by Daan Wierstra and Tom Schaul of Juergen Schmidhuber's group ("Natural Evolution Strategies", 2008). It was similar to the "estimation of distribution algorithms" (EDAs) developed by Heinz Muehlenbein and Gerhard Paass a decade earlier ("From Recombination of Genes to the Estimation of Distributions", 1996).