Piero Scaruffi(Copyright © 2006 Piero Scaruffi | Legal restrictions - Termini d'uso )
Connectionism and Neural Machines
(These are excerpts from, or extensions to, the material published in my book "The Nature of Consciousness")
Artificial Neural Networks
An artificial "neural network" is a piece of software or hardware that simulates the neural network of the brain. Several simple units are connected together, with each unit connecting to any number of other units. The "strength" of the connections can fluctuate from zero strength to infinite strength. Initially the connections and their strengths are set randomly. Then the network is either "trained" or forced to train itself. "Training" a network means using some kind of feedback to adjust the strength of the connections. Every time an input is presented, the network is told what the output should be and asked to adjust its connections accordingly.
For example, the input could be a picture of an apple and the required output could be the string of letters A-P-P-L-E. The first time, equipped with random connections, the network produces some random output. The requested output (A-P-P-L-E) is fed back and the network reorganizes its connections to produce such an output. Another image of an apple is presented as the input and the output is forced again to be the string A-P-P-L-E. Every time this happens the connections are modified to produce the same output even if all images of apples are slightly different. The theory predicts that at some point the network will start recognizing images of apples even if they are all slightly different from the ones it saw before.
Formally: a neural net is a nonlinear directed graph in which each element of processing (each node) receives signals from other nodes and emits a signal towards other nodes, and each connection between nodes has a weight that can vary in time.
A number of algorithms have been proposed for adjusting the strengths of the connections based on the expected output. Such an algorithm must eventually "converge" to a unique and proper configuration of the neural network. The network can continue learning forever, but it must be capable of not forgetting what it has already learned. The larger the network (both in terms of units and in terms of connections) the easier to reach a point of stability.
Artificial neural networks are typically used to recognize an image, a sound, a written word. But, since everything is ultimately a pattern of information, there is virtually no limit to their applications. For example, they can be used to build expert systems. An expert system built with the technology of knowledge-based systems (a "traditional" expert system) relies on a knowledge base which represents the knowledge acquired over a lifetime by a specialist. An expert system built with neural-network technology would be a neural network which has been initialized with random values and trained with a historical record of "cases". Instead of relying on an expert, one would rely on a long list of previous cases in which a certain decision was made. If the network is fed this list and "trained" to learn that this long list corresponds to a certain action, the network will eventually start recommending that certain action for new cases that somehow match that "pattern".
Imagine a credit scoring application: the bankís experts use some criteria for deciding whether a business is entitled to a loan or not. A knowledge-based system would rely on the experience of one such expert and use that knowledge to examine future applications. A neural network would rely on the historical record of loans and train itself from that record to examine future applications.
The approach is almost completely opposite, even if it should lead to exactly the same behavior.
Parallel Distributed Computing
One can view a connectionist structure as a new form of computing, a different way of finding a solution to a problem (than searching a solution space). Traditionally, we think of problem solving as an activity in which a set of axioms (of things we know for sure) helps us figure out whether something else is true or false. We derive the "theorem" from the premises through a sequence of logical steps. There is one, well-defined stream of information that flows from the premises to the demonstration of the theorem. This is the approach that mathematicians have refined over the centuries.
On the contrary, a connectionist structure such as our brain works in a non-sequential way: many "nodes" of the network can be triggered at the same time by another node. The result of the "computation" is a product of the parallel processing of many streams of information. There are no axioms and no rules of inference. There are just nodes that exchange messages all the time and adjust their connections depending on the frequency of those messages. No logic whatsoever, no reasoning, no "intelligence" is required. Information does not flow: it gets propagated. Computing (if it can still be called "computing") occurs everywhere in the network, and it occurs all the time.
The obvious reason to be intrigued by connectionist (or "neural") computing is that our brain does it, and, if our brain does it, there must be a reason. Another reason is that this form of computing does have advantages over the logical approach. There are many tasks that would be extremely difficult to handle with Logic, but are quite naturally handled by neural computation. For example, what our brain does best: recognizing patterns (whether a face or a sound).
It has been proven that everything that knowledge-based systems do can be done as well with neural networks.
The idea of connectionism, of computing in a network rather than in a formal system, basically revolutionized the very concept of problem solving. After all, very few real-world problems can be solved in the vacuum of pure logic. From weather forecast to finance, most situations involve countless factors that interact with each other at the same time. One can predict the future only if one knows all the possible interactions.
Computational Models of the Brain
In 1943 the American physiologist and mathematician Warren McCulloch, in cooperation with Walter Pitts, wrote a seminal paper that laid down the foundations for a computational theory of the brain. McCulloch transformed the neuron into a mathematical entity by assuming that it can only be in one of two possible states (formally equivalent to the zero and the one of computer bits). These "binary" neurons have a fixed threshold below which they never fire. They are connected to other binary neurons through connections (or "synapses") that can be either "inhibitory" or "excitatory": the former bring signals that keep the neuron from firing, the latter bring signals that want the neuron to fire. All binary neurons integrate their input signals at discrete intervals of time, rather than continuously. The model is therefore very elementary: if no inhibitory synapse is active and the sum of all excitatory synapses is greater than the threshold, then the neuron fires; otherwise it doesnít. This represents a rather rough approximation of the brain, but it can do for the purpose of mathematical simulation.
Next, McCulloch and Pitts proved an important theorem: that a network of binary neurons is fully equivalent to a Universal Turing Machine, i.e., that any finite logical proposition can be realized by such a network, i.e., that every computer program can be implemented as a network of binary neurons. Two most unlikely worlds as the one of Neurophysiology and ther one of Mathematics had been linked.
It took a few years for the technology to catch up with the theory. Finally, at the end of the 1950s, a few neural machines were constructed. Frank Rosenblatt's "Perceptron" (1957), Oliver Selfridge's "Pandemonium" (1958), Bernard Widrow's and Marcian Hoff's "Adaline" (1960) introduced the basic concepts for building a neural network. For simplicity purposes a neural network can be structured in layers of neurons, the neurons of each layer firing at the same time after the neurons of the previous layer have fired. The input pattern is fed to the input layer, whose neurons trigger neurons in the second layer, and so forth till neurons in the output layer are triggered at last, and a result is produced. Each neuron in a layer can be connected to many neurons in the previous and following layer. In practice, most implementations had only three layers: the input layer, an intermediary layer and the output layer.
After a little while, each layer has "learned" something, but at a different level of abstraction. In general, the layering of neurons plays a specific role. For example, the wider the intermediate layer, the faster but the less accurate the process of categorization, and viceversa.
In many cases, learning is directed by feedback. "Supervised learning" is a way to send feedback to the neural network by changing synaptic strengths so as to reflect the error, or the difference between what the output is and what it should have been; whereas in "unsupervised" learning mode the network is able to learn categories by itself, without any external help.
Whether supervised or not, a neural network can be said to have learned a new concept when the weights of the connections converge towards a stable configuration.
Neural networks are fundamentally different from the sequential, Von Neumann computer. Information is processed in parallel, rather than sequentially. The network can modify itself (i.e., learn), based on its performance. Information is spread across the network, rather than being localized in a particular storage place. The network as a whole can still function even if a piece of the network is not functioning.
The technology of neural networks promised to lead to a type of computer capable of learning, and, in general, of more closely resembling our brain.
The brain is a neural network that exhibits one important property: all the changes that occur in the connections eventually "converge" towards some kind of stable state. For example, the connections may change every time I see a friend's face from a different perspective, but they "converge" towards the stable state in which I always recognize him as him. Some kind of stability is important for memory to exist, and for any type of recognition to be performed. Neural networks must exhibit the same property if they have to be useful for practical purposes and plausible as models of the brain. Several different mathematical models were proposed in the quest for the optimal neural network.
The discipline of neural networks quickly picked up steam. More and more complex machines were built. Until in 1968 the American mathematician Marvin Minsky proved (or thought he proved) some intrinsic limitations of neural networks. All of a sudden, research on neural networks became unpopular and for more than a decade the discipline languished.
In 1982 the American phycisist John Hopfield revived the field by proving the second milestone theorem of neural networks. He developed a model inspired by the "spin glass" material, which resembles a one-layer neural network in which: weights are distributed in a symmetrical fashion; the learning rule is "Hebbian" (the rule that the strength of a connection is proportional to how frequently it is used, a rule originally proposed by the Canadian psychologist Donald Hebb); neurons are binary; and each neuron is connected to every other neuron. As they learn, Hopfield's nets develop configurations that are dynamically stable (or "ultrastable"). Their dynamics is dominated by a tendency towards a very high number of locally stable states, or "attractors". Every memory is a local "minimum" for an energy function similar to potential energy. Hopfield's argument, based on Physics, proved that, despite Minsky's critique, neural networks are feasible.
Research on neural networks picked up again. In 1982 Kunihiko Fukushima built the "Neocognitron", based on a model of the visual system.
In 1985 Geoffrey Hinton and Terrence Sejnowsky developed an algorithm for the "Boltzmann machine" based on Hopfield's simulated annealing. In that machine, Hopfield's learning rule is replaced with the rule of annealing in metallurgy (start off the system at very high "temperature" and then gradually drop the temperature to zero), which several mathematicians were proposing as a general-purpose optimization rule. In this model, therefore, units update their state based on a stochastic decision rule. The Boltzmann machine turned out to be even more stable than Hopfield's, as it will always ends in a global minimum (the lowest energy state).
The "back-propagation" algorithm devised in 1986 by the American psychologist David Rumelhart and the British computer scientist Geoffrey Hinton (a gradient-descent algorithm), considerably faster than the Boltzmann machine, quickly became the most popular learning rule.
The generalized "delta rule" was basically an adaptation of the Widrow-Hoff error-correction rule to the case of multi-layered networks, by moving backwards from the output layer to the input layer. This was also the definitive answer to Minsky's critique, as it proved to be able to solve all of the unsolved problems.
Hinton focused on gradient-descent learning procedures. Each connection computes the derivative, with respect to its strength, of a global measure of error in the performance of the network, and then adjusts its strength in the direction that decreases the error. In other words, the network adjusts itself to counter the error it made. Tuning a network to perform a specific task is a matter of stepwise approximation.
By the end of the 1980s, neural networks had established themselves as a viable computing technology, and a serious alternative to expert systems as a mechanical approximation of the brain.
Computational models of neural activity soon proliferated. From the "neural equations" devised in 1961 by the Italian physicist Eduardo Caianiello (""An Outline Of Thought Processes And Thinking Machines") to Stephen Grossberg's non-linear quantitative descriptions of brain processes, the number of mathematical theories on how neurons work almost exceeds the possibility of testing them. Now that the mathematics has been improved to the point of safety, the emphasis is moving towards psychological plausibility. At first the only requirement was that a neural network guaranteed to find a solution to every problem, but soon psychologists started requiring that it did so in a fashion similar to the way the human brain does it. Grossbergís models, for example, are aware of Ivan Pavlovís experiments on conditioning.
Besides proving computationally that a neural network can learn, one has to build a plausible model of how the brain as a whole represents the world. In Teuvo Kohonen's "adaptive maps", nearby units respond similarly, thereby explaining how the brain represents the topography of a situation. His unsupervised architecture, inspired by Carl von der Malsburg's studies on self-organization of cells in the cerebral cortex, is capable of self-organizing in regions. Kohonen assumes that the overall synaptic resources of a cell are approximately constant (instead of changing in accordance with Donald Hebb's law) and what changes is the relative efficacies of the synapses.
The British computer scientist Igor Aleksander has attempted to build a neural state machine, "Magnus" (1996), that duplicates the most important features of a human being, from consciousness to emotions.
Neural networks belong to a more general class of processing systems, parallel distributed processors, and neurocomputing is a special case of Parallel Distributed Processing, or PDP, whereby processing is done in parallel by a number of independent processors and control is distributed over all processes. All the models for neural networks can be derived as special cases of PDP systems, from simple linear models to thermodynamic models. The axiom of this framework is that all knowledge of the system is in the connections between the processors. This approach is better suited than sequential, Von Neumann computing for pattern matching tasks such as visual recognition and language understanding
A concept is represented not by a symbol stored at some memory location, but by an equilibrium state defined over a dynamic network of locally interacting units. Each unit encodes one of the many features relevant to recognizing the concept, and the connections between units are excitatory or inhibitory inasmuch as the corresponding features are mutually supportive or contradictory. A given unit can contribute to the definition of many concepts.
Neurons vs Symbols
Compared with knowledge-based systems, neural networks offer not only different algorithms but also a different view of mental life. Knowledge-based systems rely on Jerry Fodorís model of cognition: knowledge is represented and then computation is performed on that knowledge yielding some kind of action.
The British philosopher Andy Clark, instead, is an advocate of neural networks and highlights the reasons why neural networks provide a more plausible model for cognition than Fodorís "representations". Clark views neural networks (connectionism in general) as a shift of perspective in the way we view the mind, away from a "static" view of mental representations and towards a fluid view of the cognitive activity of the mind, towards the process, not just the structure.
Jerry Fodor's representational theory of mind was meant to provide an explanation of how thoughts become "causes". Fodor assumes that propositional attitudes ("I believe that", "I hope that", "I fear that", "I desire that") are computations on mental representations (eg, a concept such as "my name is piero"), which, in turn, can be objects of computation because they are symbolic expressions. Each kind of propositional attitude (eg, "belief") expresses a different kind of role and therefore a different kind of computation. Thus, "I believe that my name is piero" is different from "I hope that my name is piero" because the computation performed on the mental representation is different. The human brain knows how to represent and compute because it comes equipped with a "language of thought" that works just like the language of mathematical logic.
Clark does not believe such a language exists in the mind and does not believe that Fodor's vision of the mind can account for the "process" of thinking. In Fodor's model, learning is a secondary phenomenon and is largely independent of the environment. Clark, instead, advocates a model in which learning is a fundamental feature of the mind and learning is largely dependent on the environment. That is precisely the difference between knowledge-based models and connectionist models. Moreover, Fodor clearly distinguishes between the computation and the representation, whereas Clark believes that process and representation are one and the same. In a neural network they are.
Neural networks provide a more plausible model for cognition. Clark highlights three key features of connectionism: superposition, context sensitivity and representational change. Superposition is the ability to represent two things with the same structure: a same neural network can be trained (by changing the weights of its connections) to recognize multiple items. Context sensitivity follows from the fact that those weights encode multiple items and therefore the "representation" of something is automatically context-sensitive. Fodor's symbols are always the same regardless of the context in which they are located (the context is expressed via relationships between symbols), whereas neural networks embody the context of what they represent (the context is expressed internally). Representational change is not only the ability to create new representations (Fodor's models can do that by combining symbolic expressions to create new symbolic expressions) but also the acquisition of new representational capacities. The difference is that the former learns by combining pre-existing, internal expressions, whereas a connectionist model learns when trained by an external environment.
Clark also points to general considerations on biological systems. Complex biological systems have evolved subject to the constraints of "gradualistic holism": the evolution of a complex system is possible only insofar as that system is the last or latest link in a chain of structures, such that at each stage the chain involves only a small change (gradualism) and each stage yields a structure that is itself a viable whole (holism). This is precisely the way neural networks grow: at each point in time a neural network is a working network.
To Clark, the process is the key. One cannot break down or troubleshoot how a network does what it does because it depends on the "process" of learning: just looking at the result of learning is not enough to understand how the network performs the task that it has learned to perform. It is like watching a man ride a bike without having watched how he learned to ride the bike: the process of learning is what explains how he is now capable of riding the bike. If we try to analyze his action of riding the bike, we basically try to reduce the task of riding the bike to a set of symbols, which is a contradiction in terms because, again, riding a bike is not obtained by computing symbols, it is obtained by learning how to ride a bike.
Clark points out that, ideally, neural networks should also be able to undergo what developmental psychologist Karmiloff-Smith calls "redescriptions", or complete reorganizations of knowledge that open up new cognitive abilities and lead to a new developmental stage (as happens during child development).
On the other hand, since they are trained by a set of data that comes from the environment, connectionist systems depend on luck: they can only learn if the set of data includes enough statistical information (technically: associative learning is heavily dependent on the statistical distribution of input data). Our brain somehow learns even in a hostile environment that does not provide enough data about this or that concept, but neural networks fail badly to learn anything unless the set of input data is favorable to the desired training (their success depends on the continued availability of a friendly training environment). Clark's suggestion is that the human mind is a neural network that has evolved over thousands of years and therefore has absorbed huge amounts of innate knowledge. In other words, connectionism is not all there is to human cognition: evolution is another big piece of the story, because it predisposes the network.
While Fodor views concepts as the building blocks of thoughts and as represented by fixed structures and as causing action through their relationships, Clark views a concept as a set of skills that a network learns, and views the effect of those skills as the "behavior" of that network. Folk psychology creates the belief that there are such things as concepts when in reality there are only sets of learned skills. To ascribe a concept to a person is to ascribe a set of skills to that person. The set of skills defines the potential behavior of that person or network. Thus "concepts" are basically an illusion created by the language of folk psychology.
The Road from Neurons to Symbols
Computational models of neural networks have greatly helped in understanding how a structure like the brain can perform. Computational models of cognition have improved our understanding of how cognitive faculties work. But neither group has developed a theory of how neural processes lead to symbolic processes, of how electro-chemical reactions lead to reasoning and thought.
A bridge is missing between the physical, electro-chemical, neural processes and the macroscopic mind processes of reasoning, thinking, knowing, etc., in general, the whole world of symbols. A bridge is missing between the neuron and the symbol. Several philosophers have tried to fill the gap.
The "harmony" theory proposed by the American computer scientist Paul Smolensky is an effort in this direction. Smolensky worked out a theory of dynamic systems that perform cognitive tasks at a subsymbolic level. The task of a perceptual system can be viewed as the completion of the partial description of static states of an environment. Knowledge is encoded as constraints among a set of perceptual features. The constraints and features evolve gradually with experience. Schemata are collections of knowledge atoms that become active in order to maximize what he calls "harmony". The cognitive system is, de facto, an engine for activating coherent assemblies of atoms and drawing inferences that are consistent with the knowledge represented by the activated atoms. A harmony function measures the self-consistency of a possible state of the cognitive system. Such harmony function obeys a law that resembles simulated annealing (just like the Boltzmann machine): the best completion is found by lowering the temperature to zero.
The American philosopher Patricia Churchland aims at a unified theory of cognition and neurobiology, of the computational theory of the mind and the computational theory of the brain. According to her program, the symbols of Fodor's mentalese should be somehow related to neurons, and abstract laws for cognitive processes should be reduced to physical laws for neural processes.
Nonetheless, the final connection, the one between the connectionist model of the brain and the symbol-processing model of the mind, is still missing.
Aleksander Igor: IMPOSSIBLE MINDS (Imperial College Press, 1996)
Anderson James & Rosenfeld Edward: NEURO-COMPUTING (MIT Press, 1988)
Anderson James: NEURO-COMPUTING 2 (MIT Press, 1990)
Anderson James: AN INTRODUCTION TO NEURAL NETWORKS (MIT Press, 1995)
Arbib Michael: THE HANDBOOK OF BRAIN THEORY AND NEURAL NETWORKS (MIT Press, 1995)
Bechtel William & Adele Abrahamsen: CONNECTIONISM AND THE MIND (MIT Press, 1991)
Churchland Patricia: NEUROPHILOSOPHY (MIT Press, 1986)
Clark Andy: MICROCOGNITION (MIT Press, 1989)
Clark, Andy: ASSOCIATIVE ENGINES (MIT Press, 1993)
Davis Steven: CONNECTIONISM (Oxford University Press, 1992)
Grossberg Stephen: NEURAL NETWORKS AND NATURAL INTELLIGENCE (MIT Press, 1988)
Hassoun Mohamad: FUNDAMENTALS OF ARTIFICIAL NEURAL NETWORKS (MIT Press, 1995)
Haykin Simon: NEURAL NETWORKS (Macmillan, 1994)
Hecht-Nielsen Robert: NEUROCOMPUTING (Addison-Wesley, 1989)
Hertz John, Krogh Anders & Palmer Richard: INTRODUCTION TO THE THEORY OF NEURAL COMPUTATION (Addison-Wesley, 1990)
Kohonen Teuvo: SELF-ORGANIZING MAPS (Springer Verlag, 1995)
Levine Daniel: INTRODUCTION TO NEURAL AND COGNITIVE MODELING (Lawrence Erlbaum, 1991)
McClelland James & Rumelhart David: PARALLEL DISTRIBUTED PROCESSING vol. 2 (MIT Press, 1986)
Minsky Marvin: PERCEPTRONS; AN INTRODUCTION TO COMPUTATIONAL GEOMETRY (MIT Press, 1969)
Rumelhart David & McClelland James: PARALLEL DISTRIBUTED PROCESSING VOL. 1 (MIT Press, 1986)