(See also "Demystifying Machine Intelligence" that contains these and other similar points)
Artificial Intelligence and Brute Force
In june 2012 a combined Google/Stanford research team, led by the Stanford
computer scientist Andrew Ng and the Google fellow Jeff Dean, used an array
of 16,000 processors to create a neural network with more than one billion
connections and let it loose on the Internet to learn from millions of YouTube videos how to recognize cats.
The result is impressive but it was achieved with no major conceptual
breakthrough. The discipline of Artificial Intelligence is still using
concepts introduced in the 1950s and 1960s, and just tweaking them a bit.
The difference between then and now is that now A.I. scientists can use
thousands of powerful computers to get what they want. It is just brute
force with little or no sophistication.
Whether this is how the human mind does it is debatable.
Basically, we should be impressed that 16,000 of the fastest computers in the
world took a few years to recognize a cat, something that a kitten with
a still undeveloped brain can do in a split second...
I would be happy if the 16,000 computers could just simulate the 302-neuron
brain of the roundworm, no more than 5000 synapses that nonetheless can
recognize with incredible accuracy a lot of very interesting things.
Over the last two decades there have been virtually
no conceptual innovation in the field.
Brute force is the paradigm that dominates these days.
After all, by indexing millions of webpages Google is capable of providing
an answer to the vast majority of questions (even "how to" questions),
something that no expert system even came close to achieving.
I wonder if slow and cumbersome computers were a gift to the scientific community of the 1960s because it forced computer scientists to come up with creative models instead of just letting computers crunch numbers until they find a solution.
For the record (the first part is taken from by book "The Nature of Consciousness"):
- In 1982 the US physicist John Hopfield "Neural Networks And Physical Systems With Emergent Collective Computational Abilities") revived the field by proving the second milestone theorem of neural networks. He developed a model inspired by the "spin glass" material, which resembles a one-layer neural network in which: weights are distributed in a symmetrical fashion; the learning rule is "Hebbian" (the rule that the strength of a connection is proportional to how frequently it is used, a rule originally proposed by the Canadian psychologist Donald Hebb); neurons are binary; and each neuron is connected to every other neuron. As they learn, Hopfield's nets develop configurations that are dynamically stable (or "ultrastable"). Their dynamics is dominated by a tendency towards a very high number of locally stable states, or "attractors". Every memory is a local "minimum" for an energy function similar to potential energy. Hopfield's argument, based on Physics, proved that, despite Minsky's critique, neural networks are feasible.
- Research on neural networks picked up again. In 1982 Kunihiko Fukushima built the "Neocognitron", based on a model of the visual system.
- In 1985 the British computer scientist Geoffrey Hinton and Terrence Sejnowsky ("A Learning Algorithm for Boltzmann Machines") developed an algorithm for the "Boltzmann machine" based on Hopfield's simulated annealing. In that machine, Hopfield's learning rule is replaced with the rule of annealing in metallurgy (start off the system at very high "temperature" and then gradually drop the temperature to zero), which several mathematicians were proposing as a general-purpose optimization rule. In this model, therefore, units update their state based on a stochastic decision rule. The Boltzmann machine turned out to be even more stable than Hopfield's, as it will always ends in a global minimum (the lowest energy state).
- The "back-propagation" algorithm devised in 1986 by the US psychologist David Rumelhart and Geoffrey Hinton ("Learning Representations By Back-Propagating Errors"), a "gradient-descent" algorithm that is considerably faster than the Boltzmann machine, quickly became the most popular learning rule.
- The generalized "delta rule" was basically an adaptation of the Widrow-Hoff error-correction rule to the case of multi-layered networks, by moving backwards from the output layer to the input layer. This was also the definitive answer to Minsky's critique, as it proved to be able to solve all of the unsolved problems.
- Hinton focused on gradient-descent learning procedures. Each connection computes the derivative, with respect to its strength, of a global measure of error in the performance of the network, and then adjusts its strength in the direction that decreases the error. In other words, the network adjusts itself to counter the error it made. Tuning a network to perform a specific task is a matter of stepwise approximation.
- The problem with these methods is that they are cumbersome (if plain impossible) when applied to deeply-layered neural networks, precisely the ones needed to mimick what the brain does.
- In 1986 Paul Smolensky modified the Boltzmann Machine into what became known as the "Restricted Boltzmann machine", which lends itself to easier computation. This network is restricted to one visible layer and one hidden layer, with units in each layer never connected to units in the same layer.
- By the end of the 1980s, neural networks had established themselves as a viable computing technology, and a serious alternative to expert systems as a mechanical approximation of the brain. The probabilistic approach to neural network design had won out.
- "Learning" is reduced to the classic statistical problem of finding the best model to fit the data. There are two main ways to go about this. A generative model is a full probabilistic model of the problem, a model of how the data are actually generated (for example, a table of frequencies of English word pairs can be used to generate a "likely" sentence). Discriminative algorithms, instead, classify data without providing a model of how the data are actually generated. Discriminative models are inherently supervised. Traditionally, neural networks were discriminative algorithms.
- Further progress had to wait more than a decade: in 2006 Hinton ("A Fast Learning Algorithm For Deep Belief Nets") made Deep Belief Networks the talk of the town, basically a generative algorithm for Restricted Boltzmann Machines which suddenly relaunched neural networks and led to new, sophisticated applications to unsupervised learning.
- Deep Belief Networks are layered hierarchical architectures that stack Restricted Boltzmann Machines one on top of the other, each one feeding its output as input to the one immediately higher, with the two top layers forming an associative memory. The features discovered by one RBM become the training data for the next one.
- DBNs are still limited in one respect: they are "static classifiers", i.e. they operate at a fixed dimensionality. However, speech or images don't come in a fixed dimensionality, but in a (wildly) variable one. They require "sequence recognition", i.e. dynamic classifiers, that DBNs cannot provide. One method to expand DBNs to sequential patterns is to combine deep learning with a "shallow learning architecture" like the Hidden Markov Model.
- Meanwhile, in 2006 Osamu Hasegawa introduced Self-Organising Incremental Neural Network (SOINN), a self-replicating neural network for unsupervised learning, and in 2011 his team created a SOINN-based robot that learned functions it was not programmed to do.