The Nature of Consciousness

Piero Scaruffi

(Copyright © 2006 Piero Scaruffi | Legal restrictions - Termini d'uso )
Inquire about purchasing the book | Table of Contents | Annotated Bibliography | Class on Nature of Mind

`

Machine Intelligence

(These are excerpts from, or extensions to, the material published in my book "The Nature of Consciousness")

The Machinery of the Mind

Is the mind a machine? And, if it is, can we build one in a factory?

The fascination with the idea of building an artificial mind dates from centuries ago. Naturally, before building an artificial mind one should first figure out what kind of machine the human mind is like. The limit to this endeavor seems to be the complexity of the machines we are capable of building. Descartes compared the mind to water fountains, the Austrian psychologist Sigmund Freud to a hydraulic system, the Russian physiologist Ivan Pavlov to the telephone switchboard and the American mathematician Norbert Wiener to the steam engine. Today our favorite model is the electronic computer. Each of these represented the most advanced technology of the time. The computer does represent a quantum leap forward, because it is the first machine that can be programmed to perform different tasks (unlike, say, dishwashers or refrigerators, which can only perform one task).

There is very little similarity between an electronic computer and a brain. They are structurally very different. The network of air conditioning conducts in a high-rise building is far more similar to a brain than the motherboard of a computer. The main reason to consider the electronic computer a better approximation of the brain is functional, not structural: the computer is a machine that can achieve a lot of what a brain can achieve. But that alone cannot be the only reason, as machines capable of representing data and computing data could be built out of biological matter or even crystals. The real reason is still that the computer is the most complex machine we ever built. We implicitly assume that the brain is the most complex thing in the world and that complexity is what defines its uniqueness. Not knowing how it works, we simply look for very complex apparati. Our approach has not changed much since the times of Descartes. We just have a more complex machine to play with.

It is likely that some day a more complex machine will come by, probably built of something else, and our posterity will look at the computer the same incredulous way that today we look at Descartes’ water fountains.

Human Logic

The history of Logic starts with the Greeks. Pythagoras' theorem stands as a paradigm that would influence all of western science: a relationship between physical quantities that is both abstract and eternal. It talks about a triangle, a purely abstract figure which can be applied to many practical cases, and it states a fact that is always true, regardless of the weather, the season, the millennium.

Euclides built the first system of Logic when he wrote his "Elements" (around 350 BC). From just five axioms (there is a straight line between two points, a straight line can be extended to infinite, there is a circle with any given center and radius, all right angles are equal, two parallel lines never meet), he could deduct a wealth of theorems by applying the same inference rules over and over again.

Then, of course, Aristotle wrote his "Organon" and showed that we employ more than one "syllogism" (more than one kind of reasoning). Although he listed several kinds of logical thinking, only three were widely known and eventually became the foundations of Logic. The law of the excluded middle states that an object cannot have both a property and the opposite property (I cannot be both rich and poor). "Modus ponens" states that: if all B's are C's and all A's are B's, then all A's are C's. "Modus tollens" states that: if all B's are C's and no A's are C's then no A's are B's.

After centuries of Roman indifference and of medieval neglect, Logic resumed its course. Studies on logic, from the "Dialectica" of the French philosopher Pierre Abelard (1100 AD) to the "Introductiones Logicam" of the English philosopher William of Shyreswood (1200 AD), had actually been studies on language. Logic was truly reborn with the "Summa Totius Logicae" of another Englishman, William Ockham (1300 AD), who discussed how people reason and learn. Three centuries later Francis Bacon’s "Novum Organum" (1620) and Rene' Descartes’ "Discours de la Methode" (1937) hailed the analytic method over the dialectic method and therefore started the age of modern Science. The German mathematician Gottfried Leibniz emphasized the fact that reasoning requires symbols in his "De Arte Combinatoria" (1676) and co-discovered calculus with Isaac Newton. In 1761 the Swiss mathematician Leonhard Euler showed how to do symbolic logic with diagrams. The British philosopher John Stuart Mill tried to apply logic outside of science in his "System of Logic" (1843). Non-numerical algebra was formalized by the British mathematician Augustus De Morgan in "The Foundations of Algebra" (1844).

Another Englishman, George Boole, was so fascinated by the progress of symbolic logic that in "The Laws Of Thought" (1854) claimed that logic could be applied to thought in general: instead of solving mathematical problems such as equations, one would be able to derive a logical argument. Boole's ideas evolved into "propositional logic" and then "predicate logic", which fascinated philosopher-mathematicians such as Gottlob Frege in Germany, Giuseppe Peano in Italy, Charles Sanders Peirce in the United States, and Bertrand Russell in Britain. Thought became more and more formalized. Frege's "Foundations of Arithmetic" (1884) and "Sense and meaning" (1892), Peano's "Arithmetices Principia Nova Methodo Exposita" (1889), and Russell's "Principia Mathematica" (1903) moved philosophy towards an "axiomatization" of thought.

Formal Systems

David Hilbert, a German mathematician of the beginning of the 20th century, is credited as first introducing the question of whether a mechanical procedure exists for proving mathematical theorems (fully in 1928). His goal was to reduce Mathematics to a more or less blind manipulation of symbols through a more or less blind execution of formal steps. Already implicit in Hilbert's program was the idea that such a procedure could be carried out by a machine. The discipline of formal systems was born, with the broad blueprint that a formal system should be defined by a set of axioms (facts that are known to be true) and a set of inference rules (rules on how to determine the truth or falsity of a new fact given the axioms). By applying the rules on the axioms, one could derive all the facts that are true.

A formal system employs the language of propositions (statements that can only be true or false and can be combined by binary operators such as "not", "and" and "or") and predicates (statements with a variable that can be quantified existentially or universally, i.e. can be only true or false relative to "at least" one value of the variable or "for every" value of the variable). For example, the fact that Piero Scaruffi is a 51-year old writer could be expressed as: "writer (Piero) AND age (Piero, 51)". The fact that teachers are poor can be expressed with the expression: "FOR EVERY x, teacher(x) -> poor (x)"; that translates as: every individual that satisfies the predicate "teacher" also satisfies the predicate "poor". The fact that some teachers are obnoxious can be expressed as: "FOR AT LEAST ONE x teacher(x) -> obnoxious (x)".

The language of Logic is not very expressive but it lends itself to logical reasoning, i.e. deduction.

This generation of mathematicians basically pushed logical calculus to the forefront of the tools employed to investigate the world. The apparatus of formal systems became the apparatus that one must use to have any scientific or philosophical discussion. Implicit in their program was the belief that the laws of logic "were" the laws of thought.

Incompleteness

Unfortunately, a number of logical paradoxes refused to disappear, no matter how sophisticated Logic became. All of them, ultimately, are about self-reference.

Oldest was the liar's paradox: the sentence "I am lying" is true if false and false if true. Bertrand Russell came up with the brilliant paradox of the class of classes that do not belong to themselves: such a class belongs to itself if it does not belong to itself and viceversa. This paradox is also known as the paradox of the barber who shaves all barbers who do not shave themselves. A variation on these paradoxes is often used to prove the impossibility of an omnipotent God: if God can do anything, can he build a rock that is so heavy that even s/he cannot lift it?

Hilbert was already aware of these paradoxes, but several proposals had been made to overcome them. And several more will be proposed later (from Bertrand Russell’s "Theory of Types" of 1908 to John Barwise's "Situation Theory" of 1986).

Nevertheless, Hilbert and others felt that Logic was capable of proving everything. Hilbert's goal was to find the procedure that would solve all possible problems. By applying that procedure, even non-mathematicians would prove difficult mathematical theorems. Hilbert was aware that, by applying inference rules of Logic to the facts that are known to be true, one could list all the other facts that follow to be true. The power of Logic seemed to be infinite. It made sense to imagine that Logic was capable of proving anything.

The dream of a purely mechanical procedure for solving mathematical problems was shattered by yet another paradox, the one known as Goedel's theorem. In 1931 the (Czech-born) Austrian mathematician Kurt Goedel proved that any formal system (containing the theory of numbers, i.e. Arithmetic) contains a proposition that cannot be proven true or false within that system (i.e., an "undecidable" proposition).

Intuitively, Goedel's reasoning was that the statement "I cannot be proven" is true if and only if it cannot be proven; therefore in every system there is always at least one statement that cannot be proven, the one that says "I cannot be proven".

Neither the proposition nor its negation can be proven within the system. We can't know whether it is true or false. Predicate Logic, for example, is undecidable. Therefore any formal system built on Predicate Logic happens to be built on shaky foundations. And that includes pretty much all of classical Logic. The conclusion to be drawn from Goedel’s theorem is catastrophic: the very concept of truth cannot be defined within a logical system. It is not possible to list all the propositions that are true.

Hilbert had reduced Logic to a mechanical procedure to generate all the propositions that can be proven to be true in a theory. The dual program, of reducing Logic to a mechanical procedure to prove theorems (to prove if a proposition is true), is impossible because of Goedel’s theorem (since it is not always possible to prove that a proposition is true). This came to be known as the "decision problem".

Note that Hilbert was looking for an "algorithm" (a procedure), not a formula. For centuries most of science and Mathematics had focused on formulas. The mighty apparatus of Physics was built on formulas. All natural sciences were dealing with formulas. Hilbert, indirectly, started the trend away from formulas and towards algorithms, a trend that would become one of the silent leitmotivs of this century. A formula permits to compute a result directly from some factors by applying mathematical operations in the sequence prescribed by the priority rules of operators. An algorithm prescribes a step-by-step procedure for achieving the result. A formula is one line with an equal sign. An algorithm is made of finite steps, and the steps are ordered. Each step can be a mathematical operation or a comparison or a change in the sequence of steps. Each step can be conceived of as an "instruction" to an ideal machine capable of carrying out those elementary steps.

Truth and Meaning

One more notion was necessary to complete the picture: meaning. What did all this mean in the end?

Aristotle had realized the importance of "truth" for logical reasoning and had offered his definition: a proposition is true, if and only if it corresponds with the facts. This is the "correspondence theory of truth".

Frege had founded Logic on truth: the laws of logic are the laws of truth. Truth is Frege's unit of meaning. In fact, it was Frege who introduced "true" and "false", the so called "truth values". Frege regarded logical propositions as expressing the application of a concept to an object, as in "author(piero)" that states that Piero is an author. Indirectly, he partitioned the universe into concepts and objects, equated concepts with mathematical functions and objects with mathematical terms. The proposition "Piero is an author" has a concept "author" that is applied to a term "Piero". All of this made sense because, ultimately, a proposition was either true or false, and that could be used to think logically.

According to Aristotle, if this proposition is true, then its meaning is that the person referred to as Piero is an author; and viceversa.

Hilbert had taken this course of action to the extreme consequences. Hilbert had emancipated Logic from reality, by dealing purely with abstractions.

In 1935 the Polish mathematician Alfred Tarski grounded Logic back into reality. He gave "meaning" to the correspondence theory of truth.

Logic is ultimately about truth: how to prove if something is true or false. But what is "truth"? Tarski was looking for a definition of "truth" that would satisfy two requirements, one practical and one formal: he wanted truth to be grounded in the facts, and he wanted truth to be reliable for reasoning. The second requirement was easily expressed: true statements must not lead to contradictions. The first requirement was more complicated. How does one express the fact that "Snow is white" is true if and only if snow is white? Tarski realized that "snow is white" is two different things in that sentence. They are used at different levels. A proposition p such as "snow is white" means what it states. But it can also be mentioned in another sentence, which is exactly the case when we say that "p is true". The fact that "p" is true and the sentence "p is true" are actually two different things. The latter is a "meta-sentence", expressed in a meta-language. In the meta-language one can talk about elements of the language. The liar's paradox, for example, is solved because "I am lying" is a sentence at one level and the fact that I am telling the truth when I am lying is a sentence at a different level; the contradiction is avoided by considering them at two different levels (language and meta-language).

Tarski realized that truth within a theory can be defined only relative to another theory, the meta-theory. In the meta-theory one can define (one can list) all the statements that are true in the theory.

Tarski introduced the concepts of "interpretation" and "model" of a theory. A theory is a set of formulas. An interpretation of a theory is a function that assigns a meaning (a reference in the real world) to each of its formulas. Every interpretation that satisfies all formulas of the theory is a model for that theory. For example, the formulas of Physics are interpreted as laws of nature. The universe of physical objects becomes a model for Physics. Ultimately, Tarski’s trick was to build "models" of the world which yield "interpretations" of sentences in that world. The important fact is that all semantic concepts (i.e., meaning) are defined in terms of truth, and truth is defined in terms of satisfaction, and satisfaction is defined in terms of physical concepts (i.e., reality). The meaning of a proposition turns out to be the set of situations in which it is true.

What Tarski realized is that truth can only be relative to something. A concept of truth for a theory (i.e., all the propositions that are true in that theory) can be defined only in another theory, its "meta-theory", a theory of that theory. All paradoxes, including Goedel’s, can then be overcome, if not solved. Life goes on.

Tarski grounded meaning in truth and in reference, a stance that would set the stage for a debate for the rest of the century.

The Turing Machine

The two great visionaries of computation, the British mathematician Alan Turing and the Hungarian mathematician John Von Neumann, had a number of influential ideas. Among the many, in 1936 Turing formalized how a machine can perform logical calculus, and, a few years later, Von Neumann explored the possibility that a machine could be programmed to make a copy of itself. In other words, Turing laid the foundations for the discipline of building an "intelligent" machine, and Von Neumann laid the foundations for the discipline of building a self-reproducing machine.

Turing defined computation as the formal manipulation of symbols through the application of formal rules (Hilbert's view of Logic), and devised a machine that would be capable of performing any type of computation (Hilbert's dream).

Hilbert's ideas can be expressed in terms of mathematical "functions". A predicate can always be made to correspond to a function. For example, "age(Person,Number)" corresponds to the function "Number= age(person)"; and viceversa. Both mean that the age of Person is Number. The advantage of using functions instead of predicates is that it is easier to manipulate functions than predicates. For example, the American mathematician Alonzo Church showed how two functions can be compared. A function can be defined in an "extensional" way (the pairs of input and output values) or in an "intensional" way (the computational procedure it performs). Comparing two extensional definitions can take forever (there can be infinite pairs of input and output). In order to compare two intensional definitions, Church invented the "Lambda abstraction", which provides rules to transform any function in a "canonical" form. Once they are in canonical form, two functions can be easily compared.

In the language of functions, Hilbert's goal was to find a mechanical procedure to build all computable (or "recursive") functions. So a recursive function is defined by an algorithm, which in turn can be implemented by a computer program. Recursive functions correspond to programs of a computer. Not surprisingly, it turns out that a predicate is decidable (can be proven true or false) if and only if the corresponding function is recursive, i.e. computable.

Turing realized this and, when he set himself to find a mechanical procedure to perform logical proofs, he basically set himself to invent the computer (at least conceptually). His thought experiment, the "Turing Machine", is the algorithm that Hilbert was looking for, and it turns out that it is also the general algorithm that fuels electronic computers.

A Turing Machine is capable of performing all the operations that are needed to perform logical calculus: read current symbols, process them, write new symbols, examine new symbols. Depending on the symbol that it is reading and on the state in which it is, the Turing machine decides whether it should move on, turn backwards, write a symbol, change state or stop. Turing's machine is an automatic formal system: a system to automatically compute an alphabet of symbols according to a finite set of rules.

Church argued that everything that is computable in nature can be computed with a Turing machine.

But there can be infinite Turing machines, depending on the rules to generate new symbols. So Turing described how to build a machine that would simulate all possible Turing machines.

The "Universal Turing Machine" is a Turing Machine capable of simulating all possible Turing Machines. It contains a sequence of symbols that describes the specific Turing machine that must be simulated. For each computational procedure, the universal machine is capable of simulating a machine that performs that procedure. The universal machine is therefore capable of computing any computational function. In other words, since the Universal Turing Machine is a machine capable of performing any other machine, it is capable of solving all mathematical problems. A computer is nothing but a Turing machine with a finite memory.

When Von Neumann neatly divided the data from the instructions (instructions are executed one at the time by the "processor" and they operate on data kept in a "memory"), he simply interpreted for engineers the concepts of the Universal Turing Machine: a computer (the hardware) can solve any problem if it is fed the appropriate program (the software). That architecture was to become the architecture of the computer and today’s "sequential" computers (the most common varieties) are still referred to as "Von Neumann architectures". They are "sequential" in the sense that they are controlled by software programs and execute one instruction after the other from those programs.

Ultimately, Turing reduced Hilbert’s program to manipulation of symbols: logic is nothing more than symbol processing. Indirectly, he turned the computer into the culminating artifact of Hilbert’s formal program.

Turing showed that rational machines are feasible. Furthermore, he showed that one can build a rational machine that can perform "any" rational task.

As for Hilbert’s decision problem, both Church and Turing had basically offered a definition of "algorithm" (one based on Lambda Calculus and the other one based on the Turing machine), both definitions being equivalent and both showing that Hilbert’s question could not be answered: there is no universal algorithm to solve every mathematical problem, or there is no universal algorithm for deciding whether or not a Turing machine will stop

Later, in 1985, the physicist David Deutsch generalized Turing's ideas and defined a "quantum" machine in which Turing states can be linear combinations of states. The behavior of a quantum machine is a linear combination of the behavior of several Turing machines. A quantum machine can only compute recursive functions, just like Turing's machine, but it turns out to be much faster in solving problems that exhibit some level of parallelism. In a sense, a quantum computer is capable of decomposing a problem and delegating the sub-problems to copies of itself in other universes.

Cybernetics

Cybernetics is the science of control and communication.

Cybernetics was born out of the passion and excitement generated by the spread of complex mechanical and electrical machines, whose functioning was largely based on control processes. Cybernetics found similarities between some of those mechanical processes and some biological processes.

The concepts introduced by cybernetics built a bridge between machines and nature, between "artificial" systems and natural systems. For example, most machines employ one type of "feedback" or another. Feedback, by sending back the output as input, helps control the proper functioning of the machine.

The self-regulatory character of the human nervous system had been emphasized since the 1920s by the Russian physiologist Nicholas Bernstein. He concluded that, given the complexity of the human motor system, movement could not possibly be "commanded" by a central processor. Instead, he thought that movement was achieved by continually analyzing sensory inputs and adjusting motor output consequently. When I extend the arm to grab something, I have not computed exactly the trajectory and speed of my movement. I am re-computing it every second as my arm approaches the object. There is no computer in the brain that calculates the exact trajectory for the arm and the hand and the fingers in order to reach the glass. There is a continuous dialogue between the senses and the arm, the hand and the fingers, so that the trajectory is adjusted as motion proceeds.

Feedback

"Negative" feedback occurs when the output of the engine is fed back into the engine for the purpose of "controlling" it. For example, every engine has a valve that helps stabilize its power: the valve opens or closes depending on whether the engine is working too little or too much. The resulting power is always the same because the valve "balances" the work of the engine. The valve does so by opening or closing in a manner that depends on the work of the engine: in other words, the output of the engine is used to determine how much of the output of the engine has to be curtailed. This is negative feedback because the valve operates "against" the engine: it reverses the trend of the engine. The valve is canceling the fluctuations in the work of the engine.

"Positive" feedback occurs when those fluctuations are amplified, not canceled. The output of the engine is fed back into the engine for the purpose of reinforcing it. Instead of a stable output, we get runaway acceleration or complete rest, because positive feedback increases a perturbation instead of curbing it. Needless to say, positive feedback is not often used by engineers, who are more interested in building stable machines, rather than machines that rapidly self-destroy. But positive feedback is common in nature, where it determines the size of a population (until negative feedback prevails in the form of limited resources) and aggressive behavior (until negative feedback prevails in the form of a stronger opponent).

In human societies positive feedback is not rare. For example, positive feedback is often responsible for bestsellers: a record will sell more once it enters the best-selling charts and it will keep getting more popular for the simple reason that it is popular.

Homeostasis

The American mathematician Norbert Wiener (who founded Cybernetics in 1947) first recognized the importance of feedback (a term that he coined) for any meaningful behavior in the environment: a system that has to act in the environment must be able to continuously compare its performed action with the intended action and then infer the next action from their difference. This is what all living organisms do all the time in order to survive.

Feedback is the action of feeding a system its past performance. Given past performance, the system can adjust future performance. All biological systems (animals, plants, ecosystems) exhibit feedback. Feedback is the basis of life. As Bernstein had asserted, we could not even coordinate our limbs if we were not capable of using feedback.

Feedback is crucial for "homeostasis", the phenomenon (first described in the 1930s by the American biologist Walter Cannon) by which an organism tends to compensate variations in the environment in order to maintain its internal stability; i.e., by which an organism adapts to the environment. Homeostasis consists in maintaining a constant internal state in reaction to changes in the environment and through action on the environment. For example, body temperature is controlled by perspiring and shivering (one lowers the temperature and the other one increases it). Homeostasis is crucial for survival.

Given the number of factors that must be taken into account for any type of action in the real world, it is not surprising that the brain evolved to use feedback, rather than accurate computation, to guide motion.

It is not a coincidence that feedback turns out to be as crucial also for the performance of machines in their environment. From James Watt’s steam engine on, machines have been designed so as to be able to control themselves.

A control system is a system that uses feedback to achieve some kind of steady state. A thermostat is a typical control system: it senses the temperature of the environment and directs the heater to switch on or off; this causes a change in the temperature, which in turn is sensed by the thermostat; and so forth. This loop of action and feedback, of sensing and controlling, realizes a control system. A control system is therefore capable of achieving a "goal", is capable of "purposeful" behavior (as opposed to the chaotic behavior that would result if feedback was not used).

Living organisms are control systems. Most machines are also control systems.

Another folk concept that Wiener formalized is "noise". Wiener emphasized that communication in nature is never perfect: every message carries some involuntary "noise" and in order to understand the communication the original message must be restored. This led to a statistical theory of amount of information.

Wiener understood the essential unity of communication, control and statistical mechanics, which is the same whether the system is an artificial system or a biological system. This unified discipline became "Cybernetics". A cybernetic system is a system that achieves an internal homeostatic control through an exchange of information between its parts.

The British neurologist Ross Ashby also placed emphasis on feedback. Both machines and living beings tend to change in order to compensate variations in the environment, so that the combined system is stable. For living beings this translates into "adaptation" to the environment. The "functioning" of both living beings and machines depends on feedback processes to the extent that feedback allows the system to self-organize. Ashby emphasized the power of self-organizing systems, systems made of a very high number of simple units which can evolve autonomously and adapt to the environment by virtue of their structure.

Ashby believed that in every isolated system, that is subject to constant forces, "organisms" arise that are capable of adapting to their environment. As his principle of self-organization ambitiously states: "in any isolated system, life and intelligence inevitably develop".

Control systems

The American electrical engineer William Powers extended these ideas to a hierarchical organization of control systems. First of all, he realized that a control system controls what it senses: it controls its input (the perception), not its output (the behavior). A thermostat controls the temperature, not the gas consumed by the heater. Organisms change their behavior, but they do it in order to control a perception. Behavior is the control of perception.

Next, he envisioned a system which is made of a pyramid of control systems, each one sending its output to some "lower-level" control systems. The lowest level in the hierarchy is made of control systems that use sensors to sense the environment and "effectors" to act on the environment, and some "reference level" to determine what they have to maintained constant. For example, a thermostat would sense the environment's temperature, effect the heater and maintain constant the measured temperature. At a higher level, a control system senses and effects the reference level of lower-level control systems. An engine could direct a thermostat to maintain a certain temperature. The reference level of the lower level is determined by the control systems of the higher level.

Living organisms are made of such hierarchies of control systems. "Instinctive" behavior is the control system (organized in a hierarchy) that the organism inherits at birth. They determine internally what parameters have to be maintain constant, and at which magnitude. Behavior is a backward chain of behaviors: walking up the hierarchy one finds out why the system is doing what it is doing (e.g., it is keeping the temperature at such a level because the engine is running at such a speed because… and so forth). The hierarchy is a hierarchy of goals (goals that have to be achieved in order to achieve other goals in order to achieve other goals in order to…)

This hierarchy inevitably extends outside the system and into the environment. A machine is part of a bigger machine which is part of a factory which is part of an economy which is part of a society which is part… The goal of a component of the machine is explained by a chain of higher-level goals that extend into society. In the case of living organisms, the chain of goals extends to their ecosystem, and ultimately to the entire system of life.

Algorithms and Automata

Cybernetics implied a paradigm shift from the world of continuous laws to the world of algorithms. Physical sciences had been founded on equations that were continuous, but Cybernetics could not describe any feedback-based process with one continuous equation. The most natural way to describe such a process was to break it down into the sequence of its constituent steps, one of which refers ("feeds back") to a previous one. Every mechanical process could then be interpreted as a sequence of instructions that the machine must carry out. Indirectly, the complex clockwork of a watch is carrying out the sequence of instructions to compute the time. The watch is, in a sense, an automaton that performs an algorithm to compute the time.

The effect of an algorithm is used to turn time’s continuum into a sequence of discrete quanta, and, correspondingly, to turn an analog instrument into a digital instrument. A watch, for example, is the digital equivalent of a sundial: the sundial marks the time in a continuous way, the watch advances by seconds.

The digital world (of discrete quantities) differs from the analog world (of continuous quantities) in a fundamental way when it comes to precision. An analog instrument can be precise, and there is no limit to its precision. A digital instrument can only be approximate, its limit being the smallest magnitude it can measure (seconds for a watch, millimeters for a ruler, centigrades for a thermometer, etc.). For the purpose of "recognizing" a measurement, though, a digital reading is often better: while two analog values can be so close that they can be confused, two digital values are unambiguously either identical or different. In the context of continuous values, it is difficult to decide whether a value of 1.434 and a value of 1.435 should be considered as the same value with a little noise or two different values; whereas in the context of binary values (the binary universe being a special case of digital universe), a value is unambiguously either zero or one. This feature has been known even before compact discs replaced vinyl records (the Morse code was an early application of the concept). An analog instrument will probably never measure a one as a one or a zero as a zero (it will yield measurements that are very close to one or very close to zero), whereas a digital instrument cannot measure anything else than a zero or a one because its scale does not have any other value (e.g., a digital watch cannot measure 0.9 seconds because its scale is in seconds). This limitation often translates into an advantage.

What is implicit in a cybernetic scenario is that the world is driven by algorithms, rather than by continuous physical laws. Similar conclusions were reached in Linguistics (a "generative grammar" is run by an algorithm) and in Cognitive Science (a production system is run by an algorithm).

An algorithm is a deterministic process in the form of a sequence of logical steps. A computer program simply implements an algorithm. Reducing the laws of nature to algorithms is like reducing nature to a world of automata.

Information Theory

The American electrical engineer Claude Shannon and the American mathematician Warren Weaver worked to free the physicists' definition of entropy from its thermodynamic context and apply it to Information Theory. They defined entropy (which is often taken as a measure of disorder) as the statistical state of knowledge about a question: the entropy of a question is related to the probability assigned to all the possible answers to that question. Shannon's entropy measures the uncertainty in a statistical ensemble of messages. (Informational entropy is defined as a weighted sum of the logarithms of the probabilities of the various uncertain outcomes).

Information is a reduction in uncertainty, i.e. of entropy (the quantity of information produced by a process equals the amount of entropy that has been reduced)

The second law of Thermodynamics, one of the fundamental laws of the universe, responsible for our dying among other things, states that an isolated system always tends to maximize its entropy (i.e., things decay). Since entropy is a measure of the random distribution of atoms, maximizing it entails that the distribution has to become as homogeneous as possible. The more homogeneous, the less informative a distribution of probabilities is. Therefore, entropy, a measure of disorder, is also a measure of the lack of information.

The French physicist Leon Brillouin explored further the relationship between information and entropy, with his "negentropy principle of information". He defined information as the amount of uncertainty which exists before a choice is made. Information turns out to be the difference between the entropy of the observed state of the system and its maximum possible entropy. The more information was assimilated in the thermodynamic jargon, the more scientists were able to formulate predictions and deterministic laws about it. For example, Brillouin proved that the minimum entropy cost for obtaining one bit of information is 10 to the -23 joules per degree K.

In summary, a theory of information turns out to be the dual of a theory of entropy: if information is ultimately a measure of order, entropy is ultimately a measure of disorder, and, indirectly, a measure of the lack of information.

Whether unifying machine processes and natural processes, or unifying quantities of Information Theory and quantities of Thermodynamics, the underlying theme was that of finding common features between artificial systems and natural systems. This theme caught up speed with the invention of the computer.

Algorithmic Information Theory

"Algorithmic Information Theory", as formulated in the 1960s by the Russian mathematician Andrei Kolmogorov, is a scientific study of the concept of complexity. Complexity is basically defined as quantity of information, which means that Algorithmic Information Theory is the discipline that deals with the quantity of information in systems.

The complexity of a system is defined as the shortest possible description of it; or, equivalently, the least number of bits of information necessary to describe the system. It turns out that this means: "the shortest algorithm that can simulate it"; or, equivalently, as the size of the shortest program that computes it. For example, the complexity of "pi" is the ratio between a circumference and its diameter. The emphasis is therefore placed on sequences of symbols that cannot be summarized in any shorter way. Algorithmic Information Theory looks for the shortest possible message that encodes everything there is to know about a system. Objects that contain regularities have a description that is shorter than themselves.

Algorithmic Information Theory represents an alternative to Probability Theory when it comes to study randomness. Probability Theory cannot define randomness. Probability Theory says nothing about the meaning of a probability: a probability is simply a measure of frequency. On the other hand, randomness can be easily defined by Kolmogorov: a random system is one that cannot be compressed. A random sequence is one that cannot be compressed any further. The Argentinean mathematician Gregory Chaitin proved that randomness is pervasive. His "Diophantine" equation contains 17,000 variables and a parameter which can take the value of any integer number. By studying it, Chaitin achieved a result as shocking as Goedel's theorem: there is no way to tell whether, for a specific value of the parameter, the equation has a finite or infinite number of solutions. That means that the solutions to some mathematical problems are totally random.

Incidentally, every system has a finite complexity because of the Bekenstein bound. In Quantum Theory the Bekenstein bound (named after the American physicist Jacob Bekenstein) is a direct consequence of Heisenberg’s uncertainty principle: there are upper limits on the number of distinct quantum states and on the rate that changes of state can occur. In other words, the principle of uncertainty indirectly sets an upper limit on the information density of a system, and that upper limit is expressed by the Bekenstein bound.

Physicists seem to be fascinated with the idea of quantifying the complexity of the brain and even the complexity of a human being. The American mathematician Frank Tipler estimated the storage capacity of the human brain at 10 to the 15th power and the maximum amount of information stored in a human being at 10 to the 45th power (a number with 45 zeros). Freeman Dyson computed the entropy of a human being at 10 to the 23th.

Artificial Intelligence

The term "Artificial Intelligence" was coined around 1955 by the American mathematician John McCarthy, but it has never been clarified what it was truly supposed to mean. The reason is simple: there is no consensus on what makes a machine (or, for that matter, a human being) "intelligent".

If opinions vary on whether Artificial Intelligence is feasible or not, opinions are even more varied on how Artificial Intelligence should be achieved.

At the beginning Artificial Intelligence was often equated with the quest for the "general problem solver", the program capable of solving all mathematical problems. Because the computer is a symbolic processor, and proving theorems is about processing symbols, it was natural to assume that a computer can prove all theorems. However, scientists soon realized that problem solving is not everything, and in everyday life we can solve problems that are essential to our survival (such as deciding when to cross a street) without ever using the Mathematics we studied in school.

Thus "intelligence" is not commonly defined by the number of theorems one can prove in a second (otherwise machines would already be far more intelligent than the most intelligent humans) but by the ability to move around in the real world and carry on all the tasks that humans carry out more or less effortlessly during the day.

A more realistic view is that intelligence is the result of reasoning about knowledge. Intelligent behavior originates from a base of knowledge and from the ability to carry out inferences on that knowledge base. Intelligence is essentially knowledge processing. Since a computer is ultimately a symbol processor, the issue is then how to express knowledge in a symbolic form.

The difference between knowledge and information is crucial. Information can be found in books, knowledge comes from experience. Common sense, for example, is a form of knowledge but not a form of information. Anybody can access the information stored in a medical encyclopedia, but only physicians have real knowledge about medicine. The focus of Artificial Intelligence is not in building encyclopedias, in storing huge amounts of information: it is in "cloning" humans who are experts (i.e., have acquired specialized knowledge) in a field or domain. The difference between information and knowledge is, for example, the difference between asking "who is the president of the United States?" and asking "who will be the next president of the United States?" The former question requires only "information" about who is the current president, the latter question requires "knowledge" about the domain of politics.

According to John McCarthy, knowledge representation must satisfy three fundamental requirements: "ontological" (must allow one to state the relevant facts), "epistemological" (allow one to express the relevant knowledge) and "heuristic" (allow one to perform the relevant inference). Artificial Intelligence can then be defined as the discipline that studies what can be represented in a formal manner (epistemology) and computed in an efficient manner (heuristics). The language of Logic satisfies those requirements: it allows us to express everything we know and it allows us to make computations on what is expressed by it. Each set of knowledge is in fact a mathematical theory.

The underlying assumption of the knowledge-based approach is that symbolic processing per se may lead to human-like intelligence.

Knowledge Representation

One of the crucial steps to build intelligent machines is therefore knowledge representation: first and foremost, one must encode in a machine the knowledge about the world possessed by humans. Every science needs to build a mathematical model of its world before it can perform any inference and draw any conclusions. Physics, for example, represents natural laws with formulas. Then formulas can be combined to yield prescriptions about the effects of actions. The world of Artificial Intelligence is the world of knowledge: what must be represented formally is knowledge.

Knowledge has traditionally been formalized in three forms: facts, stimulus-response pairs (or cause-effect, or premise-action, or antecedent-consequent pairs), and relations between concepts. Facts are easily represented in first-order Predicate Logic in the form of logical expressions: "Piero is a writer" can be represented as "writer (Piero)", meaning that Piero satisfies the predicate "writer" (or that Piero belongs to the set of individuals that satisfy the predicate "writer"). For example, if we know that all writers are creative, then we can apply a simple step of deduction and derive that Piero is also creative.

"Production" rules are usually employed to express the causal connection between one fact and another fact (if something is true, then something else must be true too). For example, if somebody is a human being, then she is also a mammal. Whenever the antecedent is true, the consequent is also true. This too can be translated into Predicate Logic, because the "implication" is mathematically equivalent to a logical expression (in Logic, p IMPLIES q is equivalent to NOT p OR q). More rules can therefore be combined according to Predicate Calculus.

Finally, relations between concepts (i.e., complex concepts) can be represented with systems such as "semantic networks" and "frames". A semantic network represents concepts as nodes, "links" a concept with other concepts, and specifies of what type each link is. For example, the concept of a human being is linked to the concept of a mammal by a link of type "BELONGS TO". A concept may have many links of many types to other concepts. Ideally, all human knowledge could be represented by a gigantic semantic network.

A "frame" can be used to represent the inner structure of a concept: its attributes, their default values, the actions associated with the attributes, and, again, the links to other concepts. A car’s attributes include that its function is to move, that it has four wheels, that it costs so much, etc. Both semantic networks and frames can also be reduced to expressions of first-order Predicate Logic.

Anything that can be reduced to Predicate Logic satisfies McCarthy’s requirements.

Expert Systems

An "expert system" is simply a software system that has a knowledge base and some inference methods that can be applied to that knowledge base.

The knowledge base describes the rules that apply to the domain of expertise. The "inference engine" is capable of inferring from those rules the appropriate action in the face of a specific situation. The combination of a knowledge base and an inference engine should therefore yield a machine that behaves just like a human expert (i.e., that makes the same decisions in the same circumstances) within the domain of expertise represented in the knowledge base.

Since all methods commonly employed to represent knowledge reduce to some variant of Predicate Logic, Logic can provide the inference techniques required to draw conclusions from the knowledge base. For example, some representation systems (the "production systems") simply encode knowledge in production rules, and production rules basically assert a new fact within a knowledge base whenever some other facts have been asserted. In presence of a new situation (i.e., of a set of new facts), a number of production rules will "fire" and assert another set of new facts, which in turn will trigger more production rules, and so forth recursively ("forward chaining"). Viceversa, one can prove the truth of a statement by looking up which production rules would assert it and what has to be true in order for them to fire, and so forth recursively ("backward chaining"). This way of reasoning belongs to deduction, the most studied and reliable form of logical reasoning.

Using a paradigm originally introduced by the American economist Herbert Simon and the American mathematician Allen Newell, the process performed by an expert system can also be viewed as a "search" in a space of all possible solutions. Each logical step corresponds to a step in the search through that abstract space for the solution to the current problem. The search can be "blind" or "heuristic": the former recursively applies a set of algorithms (the same ones regardless of the type of problem at hand, such as "modus ponens" or "reductio ad absurdum"), hoping that eventually it will stumble into the solution; the latter employs "clues" about the problem at hand (or "domain heuristics") in order to find short-cuts. The algorithms employed during a heuristic search can be either "weak" methods, such as "hill climbing" and "means-end analysis", which are relatively independent of the domain, and methods which are entirely domain-specific.

Pioneering expert systems include: Newell's and Simon's "General Problem Solver" (1957), Ed Feigenbaum’s "Dendral" (1965) for analyzing chemical compounds, Bruce Buchanan’s "Mycin" (1972) for diagnosing diseases, John McDermott’s "Xcon" (1980) for configuring computers. Since the 1980s a growing number of them entered the workforce. But they were far from exhibiting any "intelligence", other than what one expects from machines.

Programs That Learn: Induction

Intelligence, though, is often regarded as "learning" knowledge rather than simply using it. One of the most cogent failures of the field of expert systems lies in the inability of such systems to learn on their own the knowledge they need in order to operate. The performance of a human being increases with experience, and the rate of increase is often considered a measure of the person’s intelligence. In an expert system, performance changes (and does not necessarily improve) only when a new knowledge base is installed. While the intelligence of an expert system is purely deductive, human intelligence is also inductive: new knowledge is continuously inferred.

In order to be capable of "learning", an expert system should continuously change its knowledge base to reflect the outcome of its actions.

Learning can be done according to two opposite paradigms: an inductive paradigm and an analytic (or deductive) paradigm.

The former constructs the symbolic description of a concept from a set of positive and negative instances (instances that belong and instances that do not belong to that concept). Usually, the symbolic description is in the form of a "discrimination" rule: if a new instance satisfies such rule, then it does belong to the concept. This view goes back to the American psychologist Jerome Bruner’s theory of concepts: a concept is defined by a set of features which are individually necessary and jointly sufficient for an instance to belong to that concept. The corresponding algorithm for learning and refining a concept, since Patrick Winston’s influential work, looks like this: for every new positive instance, build a generalization of the discrimination rule that the new instance will also satisfy; for every new negative instance, build a specialization of the rule that the new instance will not satisfy. In other words, Winston views learning as a heuristic search in a space of symbolic descriptions, driven by an incremental process of specialization and generalization.

Inductive systems include Ryszard Michalski's "conceptual clustering" and Tom Mitchell's "version space". Michalski’s method is "data-driven" like Winston’s: the symbolic description is built bottom-up from the set of instances.

Mitchell’s method is instead "model-driven", which means that symbolic descriptions are predefined and instances select the most appropriate one. In Mitchell’s case, all concepts and their abstractions are represented in a space which is partially ordered by the relation of generality. An incremental process of refinement narrows down the space to one description. New instances "shrink" down the space by generalizing the set of minimal elements and specializing the set of maximal elements. As new instances keep shrinking down the space, the concept gets defined more and more accurately until the two sets of minimal and maximal elements are the same set. Mitchell’s "version space" seems to be psychologically plausible because concepts are indeed learned and refined over a period of time, their vagueness slowly turning into crispness.

Programs That Learn: Deduction

The analytic paradigm, instead, utilizes past problem solving experience to formulate the search strategy in the space of potential solutions. Deductive learning systems include Paul Rosenbloom's "chunking", Jerry DeJong's "explanation-based learning", Jaime Carbonell's "derivational analogy", John Holland's "classifiers".

Rosenbloom’s programs are aimed at simulating the law of practice: the time required to perform an action decreases exponentially with the number of times the action is performed. His "chunking" technique progressively reduces the amount of processing needed to determine what action must be taken in the face of a situation. Ultimately, it tends to reduce every situation-action pair to a stimulus-response pair that does not require any "thinking" at all.

An explanation-based learning system (inspired by Richard Fikes' work) is given a high-level description of the target concept, a single positive instance of the concept, a description of what a concept definition is and domain knowledge. The system generates a proof that the positive instance satisfies the target concept and then generalizes the proof.

Learning by analogy was originally investigated by Patrick Winston, who focused on learning a concept analogous to another concept (which resulted in a transfer of features from a frame to another frame). Carbonell applied the method to sequences of operators rather than to features. Derivational analogy solves a problem by tweaking a plan (represented as a hierarchical goal structure) used to solve a previous problem.

Theory Formation

Learning a concept is actually not a big deal. Many concepts must be learned to perform even the simplest daily procedures, and once many concepts are learned they must be combined in a "theory" of the domain if they have to make any sense at all. Given a theory of the domain, then an individual or a system can plan meaningful actions in that domain. Theory formation turns out to be quite tricky. A group of concepts can be combined in infinite ways, and most are not very useful. Physics is a good example of a theory: concepts abound, from mass to electricity, but they are held together by just a few laws.

Douglas Lenat believed that theories can be built only by using "rules of thumb" on what a theory is and how it usually looks like. In other words, some concepts are more interesting than others, and some relations between concepts are more interesting than others. Lenat’s heuristics plays the role of a scientist’s intuition.

Lenat’s approach at building theories was model-driven. Pat Langley’s approach, instead, was data-driven: given experimental data, build a hierarchy of hypotheses and eventually a full-fledged theory that explains them. The only rule of thumb is that regularity matters and everything else does not: any theory is a theory of the regularities that occur in a domain.

Either way, one needs heuristics (intuition, rules of thumb, common sense) in order to learn a new theory. Both Lenat and Langley got intrigued by the origins of heuristics and started studying how heuristics itself can be learned. In other words: how does one progress from being a novice, who is moving blindly around the environment and is capable only of applying rigid rules, to being an expert, who relies on intuition and rules of thumbs? For Lenat this meant that one had to progress from using weak methods to using domain-specific methods through a process of generate and test (generate a strategy, test it, tweak it, and so forth). Tom Mitchell’s approach was similar, but aimed at generating the version space.

All of these are attempts at building machines that can learn. All of them are extremely limited in how and what they can learn.

Notwithstanding these attempts at building knowledge-based programs that can learn, learning has remained a liability, not an asset, of the field, especially when compared with the achievement of neural networks.

Genetic Algorithms

In the 1970s the American computer scientist John Holland had the intuition that the best way to solve a problem is to mimic what biological organisms do to solve their problem of survival: to evolve (through natural selection) and to reproduce (through genetic recombination). Genetic algorithms apply recursively a series of biologically-inspired operators to a population of potential solutions of a given problem. Each application of operators generates new populations of solutions, which should better and better approximate the best solution. What evolves is not the single individual but the population as a whole.

Genetic algorithms are actually a further refinement of search methods within problem spaces. Genetic algorithms improve the search by incorporating the criterion of "competition".

Recalling Newell and Simon's definition of problem solving as "searching in a problem space", David Goldberg defines genetic algorithms as "search algorithms based on the mechanics of natural selection and natural genetics". Most optimization methods, that work from a single point in the decision space and employ a transition method to determine the next point. Genetic algorithms, instead, work from an entire "population" of points simultaneously, trying many directions in parallel and employing a combination of several genetically-inspired methods to determine the next population of points.

One can employ simple algorithms such as "reproduction" (that copies chromosomes according to a fitness function), "crossover" (that switches segments of two chromosomes) and "mutation", as well as more complex algorithms such as "dominance" (a genotype-to-phenotype mapping), "diploidy" (pairs of chromosomes), "abeyance" (shielded against over-selection), "inversion" (the primary natural mechanism for recording a problem, by switching two points of a chromosome); and so forth.

Holland's "Classifier" (which learns new rules to optimize its performance) was the first practical application of genetic algorithms. A classifier system is a machine-learning system that learns syntactically rules (or "classifiers") to guide its performance in the environment. A classifier system consists of three main components: a production system, a credit system (such as the "bucket brigade") and a genetic algorithm to generate new rules. Its emphasis on competition and cooperation, on feedback and reinforcement, rather than on pre-programmed rules, set it apart from knowledge-based models of Artificial Intelligence.

A measure function computes how "fit" an individual is. The selection process starts from a random population of individual. For each individual of the population the fitness function provides a numeric value for how much the solution is far from the ideal solution. The probability of selection for that individual is made proportional to its "fitness". On the basis of such fitness values a subset of the population is selected. This subset is allowed to reproduce itself through biologically-inspired operators of crossover, mutation and inversion.

Each individual (each point in the space of solutions) is represented as a string of symbols. Each genetic operator performs an operation on the sequence or content of the symbols.

When a message from the environment matches the antecedent of a rule, the message specified in the consequent of the rule is produced. Other messages produced by the rules cycle back into the classifier system, some generate action on the environment. A message is a string of characters from a specified alphabet. The rules are not written in the Predicate Logic of expert systems, but in a language that lacks descriptive power and is limited to simple conjunctive expressions.

Credit assignment is the process whereby the system evaluates the effectiveness of its rules. The "bucket brigade" algorithm assigns a strength (a measure of its past usefulness) to each rule. Each rule then makes a bid (proportional to its strength and to its relevance to the current situation) and only the highest-bidding rules are allowed to pass their messages on. The strengths of the rules are modified according to an economic analogy: every time a rule bids, its strength is reduced by the value of the bid while the strength of its "suppliers" (the rules that sent the messages matched by this bidder) are increased. The bidder’s strength will in turn increase if its consumers (the rules that receive its message) become bidders. This leads to a chain of suppliers/consumers whose success ultimately depends on the success of the rules that act directly on the environment.

Then the system replaces the least useful (weak) rules with newly generated rules that are based on the system's accumulated experience, i.e. by combining selected "building blocks" ("strong" rules) according to some genetic algorithms.

Holland went on to focus on "complex adaptive systems". Such systems are governed by principles of anticipation and feedback. Based on a model of the world, an adaptive system anticipates what is going to happen. Models are improved based on feedback from the environment.

Complex adaptive systems are ubiquitous in nature. They include brains, ecosystems and even economies. They share a number of features: each of these systems is a network of agents acting in parallel and interacting; behavior of the system arises from cooperation and competition among its agents; each of these systems has many levels of organization, with agents at each level serving as building blocks for agents at a higher level; such systems are capable of rearranging their structure based on their experience; they are capable of anticipating the future by means of innate models of the world; new opportunities for new types of agents are continuously being created within the system.

All complex adaptive systems share four properties (aggregation, non-linearity, flowing, diversity) and three mechanisms (categorization by tagging, anticipation through internal models, decomposition in building blocks).

Each adaptive agent can be represented by a framework consisting of a performance system (to describe the system's skills), a credit-assignment algorithm (to reward the fittest rules) and a rule-discovery algorithm (to generate plausible hypotheses).

Emergent Computation

Emergent computation is to sequential computation what nonlinear systems are to linear systems: it deals with systems whose parts interact in a nontrivial way. Both Alan Turing and John Von Neumann, the two mathematicians who inspired the creation of the computer, were precursors in emergent computation: Turing formulated a theory of self-catalytic systems and Von Neumann studied self-replicating automata.

In the 1950s Turing introduced the "Reaction-diffusion Theory" of pattern formation, based on the bifurcation properties of the solutions of differential equations.

Turing devised a model to generate stable patterns:

  • X catalyzes itself: X diffuses slowly
  • X catalyzes Y: Y diffuses quickly
  • Y inhibits X
  • Y may or may not catalyze or inhibit itself

Some reactions might be able to create ordered spatial schemes from disordered schemes. The function of genes is purely catalytic: they catalyze the production of new morphogenes, which will catalyze more morphogenes until eventually form emerges.

Von Neumann saw life as a particular class of automata (of programmable machines). Life's main property is the ability to reproduce. In the 1940s Von Neumann had already proven that a machine could be programmed to make a copy of itself.

Von Neumann's automaton was conceived to absorb matter from the environment and process it to build another automaton, including a description of itself. Von Neumann realized (years before the genetic code was discovered) that the machine needed a description of itself in order to reproduce. The description itself would be copied to make a new machine, so that the new machine too could copy itself.

In Von Neumann's simulated world, a large checkerboard was a simplified version of the real world, in which both space and time were discrete. Time, in particular, was made to advance in discrete steps, which meant that change could occur only at each discrete step, and simultaneously for everything that had to change.

Von Neumann's studies of the 1940s led to an entire new field of Mathematics, called "Cellular Automata". Technically speaking, cellular automata are discrete dynamical systems whose behavior is completely specified in terms of a local relation. In practice, cellular automata are the computer scientist's equivalent of the physicist's concept of field. Space is represented by a uniform grid and time advances in discrete steps. Each cell of space contains bits of information. Laws of nature express what operation must be performed on each cell's bits of information, based on its neighbor's bits of information. Laws of nature are local and uniform. The amazing thing is that such simple "organisms" can give rise to very complex structures, and those structures recur periodically, which means that they achieve some kind of stability.

Von Neumann understood the dual genetics of self-reproducing automata: namely, that the genetic code must act as instructions on how to build an organism and as data to be passed on to the offspring. This was basically the idea behind what will be called DNA: DNA encodes the instructions for making all the enzymes and the protein that a cell needs to function and DNA makes a copy of itself every time the cell divides in two. Von Neumann indirectly understood other properties of life: the ability to increase its complexity (an organism can generate organisms that are more complex than itself) and the ability to self-organize.

When a machine (e.g., an assembly line) builds another machine (e.g., an appliance), there occurs a degradation of complexity, whereas the offspring of living organisms are at least as complex as their parents and their complexity increases in evolutionary times. A self-reproducing machine would be a machine that produces another machine of equal or higher complexity.

By representing an organism as a group of contiguous multi-state cells (either empty or containing a component) in a 2-dimensional matrix, Von Neumann proved that a Turing-type machine that can reproduce itself could be simulated by using a 29-state cell component.

Turing proved that there exists a "universal computing machine". Von Neumann proved that there exists a universal computing machine which, given a description of an automaton, will construct a copy of it, and, by extension, that there exists a universal computing machine which, given a description of a universal computing machine, will construct a copy of it, and, by extension, that there exists a universal computing machine which, given a description of itself, will construct a copy of itself.

Artificial Life

Another approach to building intelligent programs is based on "Artificial Life" (a term coined by Chris Langton in 1987). "Intelligence" (or, better, "cognition") cannot do without life. Intelligence is a product of life, and it is an evolutionary product of the evolution of life. On the other hand, there is more and more evidence to support the mirror view: that life is very much about cognition, that all life is "cognitive" in nature.

The first computer viruses were produced at Bell Labs in 1962 (the term was coined by David Gerrold in his novel "When Harley was one"). When computer viruses became famous, they simply popularized the discipline that was attempting to build self-replicating automata at software level, a school of thought started by Von Neumann decades earlier. Self-replication (the ability to produce offspring from self-contained instructions) is the prerequisite to evolution.

It turns out that self-replicating and evolving systems can also replace expert systems.

Artificial Intelligence solves a problem by reasoning about the knowledge of the problem's domain. Artificial Life ("Alife") lets possible solutions "evolve" in that domain until they fit the problem. Sometimes there is no perfect solution, just a "best fit". Solutions evolve in populations according to a set of "genetic" algorithms a` la Holland that mimic biological evolution. Each generation of solutions, as obtained by applying those algorithms to the previous generation, is better "adapted" to the problem at hand.

Software environments like the "Tierra" program, developed in 1992 by an American ecologist, Thomas Ray, simulate a world and an evolving population of organisms. Tierra is populated with digital organisms that compete for space in the computer memory and for time in the computer processor. Whatever space and time they manage to get, they use it to reproduce themselves. Like with most simulations of this type, a digital organism’s phenotype is also its genotype (the genome is also the body, or viceversa).

Ray draws a distinction between two types of Alife: weak (simulation of life) and strong (synthesis/instantiation of life). The difference is that one is man-made while the other has evolved to be living from inanimate "matter". Tierra, for example, starts out with instances of a simple replicating code and is left to evolve into a living system capable of metabolizing, reproducing and evolving while it interacts with its environment. Ray focuses on "the second major event in the history of life, the origin of diversity."

As Langton points out, the key concept in Alife is "emergent behavior".

These virtual worlds are more than simple simulations of algorithms. They may well be philosophical investigations in the very nature of the universe. For example, the Italian physicist Tommaso Toffoli speculated that the universe could be viewed as a computer. Frank Tipler points out that, at least, that there is no way to tell a computer simulation of the real world from the real world, as long as one is inside the simulation. A simulated observer would perceive the simulated world exactly the same way that the real observer perceives the real world. Any test to reveal whether her world is the real world would succeed, by definition. Therefore, there is also no way for me to tell whether I am a simulated observer inside a simulated universe, or a real observer inside a real universe. Therefore, the distinction between reality and simulation becomes fictitious.

Most evolutionary engineering is software-based. Hardware-based simulations of natural evolution are based on the idea of a software bit string that is used to configure programmable logic devices as a genetic algorithm chromosome, so that the configuration of the circuit will evolve at electronic speed. The final goal is to build machines that evolve independently, or, more properly, "evolvable hardware" (a discipline that was officially born in 1995). The Swiss computer scientist Daniel Mange builds electronic circuits that can grow/evolve rather than be designed. Mange's "embryological electronics" employs field programmable gate arrays that exhibit the ability to reproduce the circuit of any programmable function and to self-repair. The "Firefly Machine", for example, is based on a variation of Von Neumann’s cellular programming techniques: parallel cellular machines evolve to solve a problem. The "Embryonics" project deals with ontogeny, or growth: just like any multicellular organism grows over its lifetime, so a multicellular automata should exhibit embryonic development driven by the same processes of cellular division and differentiation.

Another center for biologically-inspired systems is the Evolvable Systems Lab in Japan, headed by Tetsuiya Higuchi.

Basically, Artificial Life replaced the "problem solver" of Artificial Intelligence with an evolving population of problem solvers. The "intelligence" required to solve a problem is not in an individual anymore, it is in an entire population and its successive generations; it is not due to the knowledge of a solver, but to the evolutionary algorithms of nature that operate on the genetic code of a population.

It is not the solver who is smart enough to solve the problem, but the knowledge she has. It is evolution that eventually builds the solver who is smart enough to solve the problem using the knowledge that is available.

The Turing Test

In 1950 Turing proposed a test to determine whether a machine is intelligent or not: a computer can be said to be intelligent if its answers are indistinguishable from the answers of a human being. The test can be performed on the machine alone or on the machine and a human. In the former case, the "observer" of the test must be led to believe, by the machine’s cunning answers, that the tested thing is a human. In the latter case, the observer must be incapable of telling which are the answers of the human and which are the answers of the machine.

Turing’s article ("Computing machinery and intelligence") started the quest for the "intelligent" computer that led to Artificial Intelligence.

Regardless of what Artificial Intelligence has achieved so far, a debate has been raging about whether an intelligent machine is possible or not at all. After the invention of the computer, a number of thinkers from various disciplines (Herbert Simon, Allen Newell, Noam Chomsky, Hilary Putnam, Jerry Fodor) adopted a paradigm modeled after the relationship between the hardware and the software of a computer. They basically reduced "thinking" to the execution of an algorithm in the brain.

John Searle is the foremost opponent of Artificial Intelligence. He argues that computers are purely syntactical and therefore cannot be said to be thinking. In his thought experiment of the "Chinese room", a man who does not know how to speak Chinese, but is provided by formal rules on how to build perfectly sensible Chinese answers, would pass the analogous of the Turing test for understanding Chinese, even if he will never know what those questions and those answers are about. That opened the floodgates of the arguments that computation per se will never lead to intelligence. Searle's Chinese room argument can be summarized as follows: computer programs are syntactical; minds have a semantics; syntax is not by itself sufficient for semantics. Whatever a computer is computing, the computer does not "know" that it is computing it: only a mind can look at it and tell what it is.

Paraphrasing Fred Dretske, a computer does not know what it is doing, therefore "that" is not what it is doing. For example, a computer does not compute 1+1, it simply manipulates symbols that eventually will yield the result of 1+1.

Countless replies have been provided. Some have observed that the man may not "know" Chinese, but the room (i.e., the man plus the rules to speak Chinese) does qualify as a fluent Chinese speaker. Some have found flaws in the premises (the theorem sets to prove what has been stated in the premises, that the man does not understand Chinese).

And, ultimately, it all depends on the definition of the word "understand". In a sense, Searle has simply slowed down and broken down the process of understanding, but what we do when we understand something is precisely what the man does in the room. So Searle’s objection is simply about the size of the information and the speed of information processing, and we would all assume that the man understands Chinese if he performed his task in a few milliseconds with the help of miniaturized microfilms invisible to us. Searle's objection sounds more like: if you can tell what the mechanism is that produces "understanding", then that cannot be true "understanding".

Searle does concede that a brain is a machine and that, in principle, we could build a totally equivalent machine that would then have consciousness. He does not agree that a computer is such a machine. Computation as defined by Turing is not sufficient to grant the presence of thinking.

The simulation of a mind is not itself a mind.

Experience vs Knowledge

Inspired by Edmund Husserl’s phenomenology, another American philosopher and critic of Artificial Intelligence, Hubert Dreyfus, thinks that comprehension can never do without the context in which it occurs. The information in the environment is fundamental for a being's intelligence.

Dreyfus criticized the four fundamental assumptions of Artificial Intelligence: biological (that the brain must operate as a symbolic processor), psychological (that the mind must obey a heuristic program), epistemological (that there must be a theory of practical activity) and ontological (that the data necessary for intelligent behavior must be discrete, explicit and determinate). In his opinion all of them are just not plausible. Furthermore, Dreyfus emphasizes the role of the body in intelligent behavior which knowledge-based systems neglect. Human experience is intelligible only insofar as it gets organized in terms of a situation (as a function of human needs).

Dreyfus presents a model of acquisition of performance by humans structured in five stages. First, we are born novices: we simply follow the rules (an instructor, a manual). The moves of novices are not secure and not fluid, although they can be technically correct. Sometimes applying a rule is plain silly, but the novice will still do so because he doesn't know better. Then we become advanced beginners. At this stage we are capable of modifying rules based on the situation. Our behavior is still driven by rules but it doesn't look as awkward. Competent humans, the next stage, follow rules but in a very fluid manner, and their rules are much more plastic: the competent human knows that she can modify the rules, and she will feel guilty if something goes wrong, even if she followed the proper rules. Proficient performers do not even follow rules anymore: they act by reflex. The fact that they have encountered similar situations many times matters more than the original rules. Experts, the final stage, do not even remember the rules. Sometimes if they have to articulate them they can't even figure them out. They just act based on their expertise and their intuition. They are often not even aware of what they are doing. An expert driver does not realize that she is shifting gears and at which point she is shifting gears. She just shifts gear when it's appropriate to.

Dreyfus points out that a failure usually results in degradation: you step back to a lower stage to understand what went wrong. An expert does not even remember the rules, but, if she can't start the car, she gradually walks down the ladder from expert to merely competent all the way down to novice and will finally pick up the driver's manual.

Only novices behave like expert systems. Human experts behave in a radically different way. An expert has synthesized experience in an unconscious behavior that reacts instantaneously to a complex situation. What the expert knows cannot be decomposed in rules or any other type of discrete knowledge representation; therefore it cannot be emulated by an expert system.

The foundation of Dreyfus' argument is that minds do not use a theory about the everyday world; and the reason is that there is no set of "context-free" primitives of understanding. Human knowledge is skilled "know-how", as opposed to the logical representations that expert systems have to rely upon, or "know-that".

Also drawing from Martin Heidegger's phenomenology, the American computer scientist Terry Winograd is skeptic that intelligence can be due to processes of the type of production systems, i.e. to the systematic manipulation of representations. Intelligent systems act, don't think. People are "thrown" in the real world and cannot afford to deal with all the possible alternatives of a situation. They think only when action does not yield the desired result. Only then do they pause to picture the situation in its complexity and decompose it into its constituents, and try to infer action from knowledge. But, again, this behavior is more typical of the novice than of the expert.

Another way to see the same argument is to consider what makes an expert so much more efficient at solving a problem: the first few seconds. A chess champion wins the game against a novice because of the first few moves, not because of the huge knowledge that the champion has and could use against the novice. That huge knowledge is, in turn, certainly important to determine the first moves (that will deliver a cripple blow to the novice).

In 1986 the American computer scientist Rodney Brooks offered an alternative way of achieving Artificial Intelligence, which significantly revised the foundations of the symbol-processing program: he argues that intelligence cannot be separated from the body. Intelligence is not only a process of the brain, it is embodied in the physical world. Every part of the body is performing an action that contributes to the overall "functioning" of the organism in the environment. There is no need for a central representation of the world, so long as all component tasks help each other operate in the world. Cognition is grounded in the physical interactions with the world. Intelligence "is" about moving in a physical world and cannot exist without a physical world.

Goedel’s Limit

The British physicist Roger Penrose (following the British philosopher John Lucas) resorted to Goedel's theorem to undermine the very foundations of Artificial Intelligence. Goedel’s theorem states that every formal system (which is bigger than Arithmetic) contains a statement that cannot be proven true or false. Indirectly, Goedel's theorem states the preeminence of the human mind over the machine: some mathematical operations are not computable, nonetheless the human mind can treat them (at least to prove that they are not computable). Humans can realize what Goedel’s theorem states, whereas a machine, limited to mathematical reasoning, would never realize what it states. We can intuitively comprehend a truth that the computer can only try (and, in this case, fail) to prove. Therefore a computer will never be equal to a mind. And, in general, no mathematical system can fully express the way my mind thinks.

Again, countless replies, have been provided.

First of all (Hilary Putnam), a computer can observe the failure of "another" computer’s formal system, just like a human mind can observe it. A computer can easily prove the proposition "if the theory is consistent, then the proposition that there is at least one undecidable proposition is true". Which is exactly all the human mind is capable of doing.

Second, even if Goedel’s theorem sets a limit, it is not a limit of the machine, it is a limit of the human mind: the human mind will never be capable of building a machine that can think. This does not prove that machines cannot think.

Third, Penrose’s demonstration can be used to prove that a machine cannot prove the validity of a mathematical demonstration, a fact that is contradicted by our experience.

Fourth (Arthur Sloman), Goedel’s theorem is false in some nonstandard mathematical systems. One of Goedel’s conditions is that the mathematical system must be consistent (i.e., not contain a contradiction), but that can only be if the undecidable statement is added to the system, assuming either true or false. Nonstandard models assume that it is false. Goedel’s theorem, because of the way Goedel carried it out (by employing infinite sets of formulas), leaves the illusion of proving a truth which in reality is never proved, cannot be proven and must be arbitrarily decided.

Fifth, Rudy Rucker believes that conscious machines could be built, following an observation of Goedel himself, that we cannot build a machine that has our mathematical intuition but such a machine can exist and can be discovered by humans. If such a machine exists, humans cannot understand its functioning. Such a machine cannot be built by humans, but could be built by Darwinian evolutionary steps starting from a man-made machine. If a machine can be built that exhibits a behavior completely similar to that of humans, then a machine can be built that is as conscious as humans. What Goedel's theorem asserts is that "the human mind is not capable of formulating all of its mathematical intuitions" (quoting Goedel himself).

Sixth, the British physicist Stephen Hawking notes that the behavior of earthworms can probably be simulated adequately with a computer, because they do not worry about Goedel sentences. Darwinian evolution can generate human intelligence from earthworm intelligence through a process (natural selection) for which Goedel's theorem is also irrelevant. Therefore, Goedel's theorem does not forbid the birth of an intelligent computer.

What Goedel’s theorem proves, if anything, is an intrinsic limit to "any" form of intelligence, including the machine’s but also Penrose’s...

The Turing Test Revisited

The Turing Test has been credited with starting a whole new branch of science. That is surprising, given that the test itself is not formulated in a scientific manner at all.

First of all, it is not clear whether Turing was concerned with intelligence, mind or consciousness. Is his test supposed to reveal whether a machine is intelligent or cognitive or conscious? The three are quite different. Nowhere does Turing bother to distinguish among them, though. Intelligence comes in degrees. Animals are intelligent, to some degree. It is debatable whether they are capable of thinking (conscious). A mentally-retarded person may not be intelligent, but she is probably conscious. Turing does not discriminate and therefore does not tell us what his test is supposed to measure.

Second, when one proposes a test to the scientific community, one must be specific about the setting (e.g., what instruments will be used). Turing’s test uses a human being to decide whether a machine is as good as another human being. Thus both the instrument and one of the quantities to be measured are humans: good scientific policy would have required him to be specific about both. He does not provide a definition or a prescription for what the observer must be. Can a mentally retarded person perform the test? Can somebody under the influence of drugs perform it? Or does it have to be the most intelligent human? (The result of the test will obviously vary wildly depending on which one Turing chooses). As for the human to be tested against the machine, Turing doesn't specify which type of human he wants to test: a priest, an attorney, an Australian aborigine, an avid reader of porno magazines, a librarian, a physician, an economist...?

The observer has to determine whether the answers to her questions come from a human or a machine. If the unknown "answerer" is the machine, and she is led to think that it is a human, then the machine qualifies as "intelligent" (or cognitive or conscious, we are not sure). But Turing does not tell us what conclusions we have to draw if the unknown answerer is a human and the observer is led to think from her answers that she is a machine. In other words, if a machine fails the test, then Turing concludes that it is not intelligent: but what does Turing conclude if a human fails the test? That humans are not intelligent?

Therefore, one of the reasons why Turing’s paper has led to so much controversy is that Turing was not clear enough about what he was saying.

Can a machine be a cognitive system? If one circumscribes cognition to the processes of remembering, learning and reasoning, most scientists would agree that, yes it is possible. Does that make it "intelligent": in the sense that most people use that word, yes, probably yes. Does it make it also aware of being what it is? Not necessarily so. These are different questions for which probably have different answers. But, if it were a well-formulated question, most people would agree on the answer.

Even if it were restated in a scientific manner, the Turing test per se would probably not amount to much: it is not right or wrong, but simply meaningless. Even if a machine could answer all questions, what would that prove? If we found a "thing" that can answer all our questions but does not eat, move, feel emotions and so forth, we would just consider it as a very sophisticated machine, not a human being. That is what the Turing test measures: how good the machine is at answering questions, nothing more.

The thesis of Turing’s test can be restated as: "Can a machine be built that will fool a human being into believing it is another human being?" Nowhere in his writings did Turing prove the equivalence of this question with the question "Can a machine think?" If we answer "yes" to the first question, we don’t necessarily answer "yes" to the second.

As the American computer scientist Stuart Russell remarked, Turing's definition is at the same time too weak and too strong. Too weak because it does not include "intelligent" behavior such as "dodging bullets" and too strong because it does include unintelligent beings such as Searle's Chinese-room translator. Most children, who cannot answer a lot of questions that an adult could answer, would not pass the test, but that does not make them machines. Turing's is a partial extensional definition, that fails to capture the intensional definition of intelligence.

Would we consider our peer an object or being that is not alive? To be conscious without being alive is a nonsense. Before we ask whether machines can think, we should therefore ask whether they can be alive.

Biological systems undergo growth. Machines cannot undergo physical growth. In machines only the "mind" can grow over time. In biological systems the "mind" grows with the rest of the body. In machines the "mind" may never decay. In biological systems the mind decays with the rest of the body. The mind is closely tied to the body. For most people a mind without a body (that grows with it) is just not a mind.

Machine Charisma

Nonetheless, it is important to realize why we are interested in knowing whether a computer can think, but not whether a refrigerator can think. They are both complex machines, and it is not obvious which one is more indispensable. In the event of a catastrophic earthquake, most people will be more concerned about the food in their refrigerator than about the files in their computer. Computers only solve some problems, not all problems: they don’t wash dishes, don’t run on roads, and don’t make ice. Ordinary life (and certainly survival) is more than mere Math.

One reason for our fixation with computers is purely socio-historical. In the age when Artificial Intelligence was born, computers were huge, and the sheer size was commanding attention. No other machine was that big (and nobody could predict that they would become so small so quickly). Because they were so terribly primitive (and this borders on the paradoxical), they were also very difficult to operate, and it was commonly held that only super-intelligent humans (and many of them together) were able to control them. The (flawed) syllogism was that if they required so much intelligence to operate, then computers must be very intelligent.

Furthermore, for anybody who was not fluent in electronics it was difficult to conceive of computation without thought: animals can do many things, but not multiplication, and certainly not faster than humans. Math was a privilege of one higher primate: us. It was psychologically easier to assume that computers could think than to tear down a thousand-year old habit of associating Mathematics with the ability to think. Finally, a lot of the excitement arose simply from confusing intelligence, cognition and consciousness, a mistake that in the age before the boom of neurophysiology was very common, even among the likes of Turing.

The one thing that computers taught us is precisely that our traditional views of intelligence were flawed and we needed better ones. Before computers were invented, the scientific community had never been forced to define and distinguish intelligence, cognition and consciousness. No animal was threatening human superiority in any of them. Computers forced us to do that.

Even if intelligence is just computing many numbers very quickly (which means that a palmtop computer is intelligent), that does not automatically entail cognition or consciousness. And viceversa: a mentally retarded person may not be able to perform a multiplication but still be capable of feelings and introspection.

Computers have convinced us that "intelligence" is simply a misconception, and the word is not scientific.

The Turing Test was a misconception born out of a misconception.

What is intelligence?

What is "intelligence"? The Turing test was based on the assumption by Newell and Simon that intelligence is about solving problems. Yet, solving problems is nothing special: finding problems is far more difficult. Given enough time and resources, most people would solve any problem. But very few people could come up with the problem in the first place.

For example, very few people wonder why it gets colder as you climb up a mountain. After all, you are moving closer to the sun, which is the main source of heat. It should obviously get warmer as we get closer to the sun. Very few people ever wonder.

Once they are told, they will eventually find the solution to the paradox. All that one has to do is to think about it, consult a book or two, call up a friend. As the philosophers of Artificial Intelligence correctly said, the solution is out there and it is just a matter of "searching" for it.

Once somebody formulates a problem, most people can solve it. We use professional "problem solvers" called "scientists" because we are normally busy doing other things.

The real intelligence is in formulating the problem, in realizing that something we take for granted is not explained by our knowledge.

Artificial Intelligence initially missed the point: somebody who can answer all the questions is not very intelligent, it just has nothing better to do. It resembles a machine more than a human.

The Turing test is not about how human a machine is, but how mechanical a human is. The Turing test tests a human, not a machine.

We can certainly build a machine that will answer all questions. But that has little to do with our "intelligence". That only has to do with "symbolic processing". The real intelligence test would be: can the machine "ask" the questions? Can we build a machine that will ask all the questions that an intelligent human would ask in a given situation? Can a machine "wonder"? Can a machine be the one asking questions and rating the answers in Turing’s test?

Further Reading

Arbib Michael: METAPHORICAL BRAIN (Wiley, 1972)

Arbib Michael: METAPHORICAL BRAIN 2 (Wiley, 1989)

Arbib Michael: BRAINS MACHINES AND MATHEMATICS (Springer Verlag, 1987)

Ashby Ross: DESIGN FOR A BRAIN (John Wiley, 1952)

Barr Avron & Feigenbaum Ed: HANDBOOK OF ARTIFICIAL INTELLIGENCE (William Kaufmann, 1982)

Bernstein Nicholas: GENERAL BIOMECHANICS (1926)

Boden Margaret: PHILOSOPHY OF ARTIFICIAL INTELLIGENCE (Oxford, 1990)

Brillouin Leon: SCIENCE AND INFORMATION THEORY (Academic Press, 1962)

Brooks Rodney & Luc Steels: THE ARTIFICIAL LIFE ROUTE TO ARTIFICIAL INTELLIGENCE (Lawrence Erlbaum, 1995)

Cannon Walter: THE WISDOM OF THE BODY (Norton, 1932)

Carbonell Jaime: MACHINE LEARNING (MIT Press, 1989)

Charniak Eugene: ARTIFICIAL INTELLIGENCE PROGRAMMING (Lawrence Erlbaum, 1987)

Cohen Fred: IT'S ALIVE (Wiley, 1994)

Dreyfus Hubert: WHAT COMPUTERS CAN'T DO (Harper & Row, 1979)

Dreyfus Hubert & Dreyfus Stuart: MIND OVER MACHINE (Free Press, 1985)

Feigenbaum Edward: COMPUTERS AND THOUGHT (MIT Press, 1995)

Frost Richard: INTRODUCTION TO KNOWLEDGE BASED SYSTEMS (MacMillan, 1986)

Genesereth Michael & Nilsson Nils: LOGICAL FOUNDATIONS OF ARTIFICIAL INTELLIGENCE (Morgan Kaufman, 1987)

Graubard Stephen: THE ARTIFICIAL INTELLIGENCE DEBATE (MIT Press, 1988)

Hofstadter Douglas: GOEDEL ESCHER BACH (Vintage, 1980)

Holland John: ADAPTATION IN NATURAL AND ARTIFICIAL SYSTEMS (Univ of Michigan Press, 1975)

Holland John: HIDDEN ORDER (Addison Wesley, 1995)

Koza John: GENETIC PROGRAMMING (MIT Press, 1992)

Langton Christopher: ARTIFICIAL LIFE (MIT Press, 1995)

Levy Steven: ARTIFICIAL LIFE (Pantheon, 1992)

Li Ming & Vitanyi Paul: AN INTRODUCTION TO KOLMOGOROV COMPLEXITY (Springer-Verlag, 1993)

Luger George: COMPUTATION AND INTELLIGENCE (MIT Press, 1995)

Maes Patti: DESIGNING AUTONOMOUS AGENTS (MIT Press, 1990)

Michalski Ryszard, Carbonell Jaime & Mitchell Tom: MACHINE LEARNING I (Morgan Kaufman, 1983)

Michalski Ryszard, Carbonell Jaime & Mitchell Tom: MACHINE LEARNING II (Morgan Kaufman, 1986)

Newell Allen & Simon Herbert: HUMAN PROBLEM SOLVING (Prentice-Hall, 1972)

Nilsson Nils: THE MATHEMATICAL FOUNDATIONS OF LEARNING MACHINES (Morgan Kaufmann, 1990)

Nilsson Nils: PRINCIPLES OF ARTIFICIAL INTELLIGENCE (Tioga, 1980)

Pearl Judea: HEURISTICS (Addison Wesley, 1984)

Powers William: BEHAVIOR: THE CONTROL OF PERCEPTION (Aldine, 1973)

Ray, Thomas: ZEN AND THE ART OF CREATING LIFE (1994)

Rucker Rudy: INFINITY AND THE MIND (Birkhauser, 1982)

Russell Stuart Jonathan: THE USE OF KNOWLEDGE IN ANALOGY AND INDUCTION (Pitnam, 1989)

Russell Stuart Jonathan & Norvig Peter: ARTIFICIAL INTELLIGENCE (Prentice Hall, 1995)

Searle John: THE REDISCOVERY OF THE MIND (MIT Press, 1992)

Shannon Claude & Weaver Warren: THE MATHEMATICAL THEORY OF COMMUNICATION (Univ of Illinois Press, 1949)

Shapiro Stuart Charles: ENCYCLOPEDIA OF ARTIFICIAL INTELLIGENCE (John Wiley, 1992)

Simon Herbert Alexander: MODELS OF THOUGHT (Yale University Press, 1979)

Simon Herbert Alexander: THE SCIENCES OF THE ARTIFICIAL (MIT Press, 1969)

Tarski Alfred: LOGIC, SEMANTICS, METAMATHEMATICS (Clarendon, 1956)

Turing Alan: PURE MATHEMATICS (Elsevier Science, 1992)

Turing Alan: MECHANICAL INTELLIGENCE (Elsevier Science, 1992)

Turing Alan: MORPHOGENESIS (North-Holland, 1992)

Von Neumann John: THE COMPUTER AND THE BRAIN (Yale Univ Press, 1958)

Von Neumann John: THEORY OF SELF-REPRODUCING AUTOMATA (Princeton Univ Press, 1947)

Wolfram Stephen: A NEW KIND OF SCIENCE (Wolfram, 2002)

Wiener Norbert: CYBERNETICS (John Wiley, 1948)

Winograd Terry & Flores Fernando: UNDERSTANDING COMPUTERS AND COGNITION (Ablex, 1986)