(These are excerpts from my book "Intelligence is not Artificial")
A Brief History of Artificial Intelligence/ Part 1
One can start way back in the past with the ancient Greek and Chinese automata of two thousand years ago or with the first electromechanical machines of a century ago, but to me a history of machine intelligence begins in earnest with "universal machine", originally conceived in 1936 by the British mathematician Alan Turing. He did not personally build it, but Turing realized that one could create the perfect mathematician by simulating the way logical problems are solved: by manipulating symbols. The first computers were not Universal Turing Machines (UTM), but most computers built since the ENIAC (1946), including all the laptops and smartphones that are available today, are UTMs. Because it was founded on predicate logic, which only admits two values ("true" and
"false"), the computer at the heart of any "intelligent"
machine relies on binary logic (ones and zeroes).
Cybernetics (that can be dated back to the 1943 paper "Behavior, Purpose and Teleology" co-written by MIT mathematician Norbert Wiener, physiologist Arturo Rosenblueth and engineer Julian Bigelow) did much to show the relationship between machines and living organisms. One can argue that machines are form of life or, viceversa, that living organisms are forms of machinery.
However, "intelligence" is commonly considered one or many steps above the merely "alive": humans are generally considered intelligent (by fellow humans), whereas worms are not.
The "Turing Test", introduced by the same Alan Turing in his paper "Computing Machinery and Intelligence" (1950), has often been presented as the kind of validation that a machine has to pass in order to be considered "intelligent": if a human observer, asking all sorts of questions, cannot tell whether the agent providing the answers is human or mechanical, then the machine has become intelligent (or, better, as intelligent as the human being).
The first influential conference for designers of intelligent machines took place in 1955 in Los Angeles: the Western Joint Computer Conference. At this conference Newell and Simon presented the "Logic Theory Machine", Newell also presented his "Chess Machine", Oliver Selfridge gave a talk on "Pattern Recognition and Modern Computers", and Wesley Clark and Belmont Farley described the first artificial neural network ("Generalization of Pattern Recognition in a Self-organizing System").
Then in 1956 John McCarthy organized the conference at Dartmouth College that gave Artificial intelligence its name.
The practitioners of Artificial Intelligence quickly split in two fields.
One, pioneered by Herbert Simon and his student Allen Newell at Carnegie Mellon University with their "Logic Theorist" (1956), basically understood intelligence as the pinnacle of mathematical logic, and focused on symbolic processing.
Logic Theorist (written by Clifford Shaw) effortlessly proved 38 of the first 52 theorems in chapter 2 of Bertrand Russell's "Principia Mathematica." A proof that differed from Russell's original was submitted to the Journal of Symbolic Logic, the first case of a paper co-authored by a computer program. In 1958 the trio also presented a program to play chess, NSS (the initials of the three).
In 1959 Arthur Samuel at IBM in New York wrote not only the first computer program that could play checkers but the first self-learning program. That program implemented the alpha-beta search algorithm, that would remain the dominant one in A.I. for 20 years.
The first breakthrough in this branch of A.I. was probably John McCarthy's article "Programs with Common Sense" (1959): McCarthy understood that someday machines would easily be better than humans at many repetitive and computational tasks, but "common sense" is what really makes someone "intelligent" and common sense comes from knowledge of the world. That article spawned the discipline of "knowledge representation": how can a machine learn about the world and use that knowledge to make inferences. This approach was somehow "justified" by the idea introduced at the MIT by Noam Chomsky in "Syntactic Structures" (1957) that language competence is due to some grammatical rules that express which sentences are correct in a language. The grammatical rules express "knowledge" of how a language works, and, once you have that knowledge (and a vocabulary), you can produce any sentence in that language, including sentences you have never heard or read before.
The rapid development of computer programming helped this field take off, as computers were getting better and better at processing symbols: knowledge was represented in symbolic structures and "reasoning" was reduced to a matter of processing symbolic expressions. This line of research led to "knowledge-based systems" (or "expert systems"), such as Ed Feigenbaum's Dendral (1965) at Stanford, that consisted of an “inference engine” (the repertory of legitimate reasoning techniques recognized by the mathematicians of the world) and a “knowledge base” (the "common sense" knowledge). This technology relied on acquiring knowledge from domain experts in order to create "clones" of such experts (machines that performed as well as the human experts). The limitation of expert systems was that they were "intelligent" only in one specific domain.
Meanwhile, the other branch of Artificial Intelligence was pursuing a rather different approach: simulating what the brain does at the physical level of neurons and synapses. The logical school of John McCarthy and Marvin Minsky believed in using mathematical logic to simulate how the human mind works; the school of “neural networks” (or “connectionism”) believed in simulating the structure of the brain to simulate how the brain works.
Since in the 1950s neuroscience was just in its infancy (medical machines to study living brains would not become available until the 1970s), computer scientists only knew that the brain consists of a huge number of interconnected neurons, and neuroscientists were becoming ever more convinced that "intelligence" was due to the connections, not to the individual neurons. A brain can be viewed as a network of interconnected nodes, and our mental life as due to the way messages travel through those connections from the neurons of the sensory system up to the
neurons that process those sensory data and eventually down to the neurons that
generate action. The neural connections can vary in strength from zero to
infinite. Change the strength of some neural connections and you change the
outcome. In other words, the strength of the connections can be tweaked to
cause different outputs for the same inputs. The problem for those designing
"neural networks" consists in fine-tuning the connections so that the
network as a whole comes up with the correct interpretation of the input; e.g.
with the word "apple" when the image of an apple is presented. This
is called "training the network". For example, showing many apples to
the system and forcing the answer "APPLE" should result in the network
adjusting those connections to recognize apples. This is called “supervised
learning”. Since the key is to adjust the strength of the connections, the
alternative term for this branch of A.I. is "connectionism".
One of the most influential books in the early years of neuroscience was "Organization of Behavior" (1949), written by the psychologist Donald Hebb at McGill University in Montreal (Canada). Hebb described how the brain learns by changing the strength in the connections between its neurons. In 1954 Wesley Clark and Belmont Farley at the MIT simulated Hebbian learning on a computer, i.e. created the first artificial neural network (a two-layer network). In 1956 Hebb collaborated with IBM's research laboratory in Poughkeepsie to produce another computer model, programmed by Nathaniel Rochester's team (that included a young John Holland).
Frank Rosenblatt's Perceptron (1957) at Cornell University and Oliver Selfridge's Pandemonium (1958) at the MIT defined the standard for "neural networks": not knowledge representation and logical inference, but pattern propagation and automatic learning.
Rosenblatt generalized Hebb's training procedure that Farley and Clark had simulated to multi-layer networks and coined the expression "back-propagating error correction".
Compared with expert systems, neural networks are dynamic systems (their configuration changes as they are used) and predisposed to learning by themselves (they can adjust their configuration). "Unsupervised" networks, in particular, can discover categories by themselves; e.g., they can discover that several images refer to the same kind
of object, a cat.
There are two ways to solve a crime. One way is to hire the smartest detective in the world, who will use experience and logic to find out who did it. On the other hand, if we had enough surveillance cameras placed around the area, we would scan their tapes and look for suspicious actions. Both ways may lead to the same conclusion, but one uses a logic-driven approach (symbolic processing) and the other one uses a data-driven approach (ultimately, the visual system, which is a connectionist system).
In 1969 Marvin Minsky and Samuel Papert of the MIT published a devastating critique of neural networks (titled "Perceptrons") that virtually killed the discipline. At the same time expert systems were beginning to make inroads at least in academia, notably Bruce Buchanan's Mycin (1972) at Stanford for medical diagnosis and John McDermott's Xcon (1980) at Carnegie Mellon University for product configuration, and, by the 1980s, also in the industrial and financial worlds at large, thanks especially to many innovations in knowledge representation (Ross Quillian's semantic networks at Carnegie Mellon University, Minsky's frames at the MIT, Roger Schank's scripts at Yale University, Barbara Hayes-Roth's blackboards at Stanford University, etc). Intellicorp, the first major start-up for Artificial Intelligence, was founded in Silicon Valley in 1980.
There was progress in knowledge-based architectures to overcome the slow speed of computers. In 1980 Judea Pearl introduced the Scout algorithm, the first algorithm to outperform alpha-beta, and in 1983 Alexander Reinefeld further improved the search algorithm with his NegaScout algorithm.
One factor that certainly helped the symbolic-processing approach and condemned the connectionist approach was that the latter uses complex algorithms, i.e. it requires computational power that at the time was rare and expensive.
(Personal biography: i entered the field in 1985 and went on to lead the Silicon Valley-based Artificial Intelligence Center of the largest European computer manufacturer, Olivetti, and i later worked at Intellicorp for a few years).
Back to the Table of Contents
Purchase "Intelligence is not Artificial"