(These are excerpts from my book "Intelligence is not Artificial")
Brute-force A.I. doesn't Speak
Most disheartening has been the scant progress in Natural Language Processing (NLP) since 1970.
Following are examples of exchanges between a person and Terry Winograd's program SHRDLU at the MIT in 1970:
Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX.
Computer: BY "IT", i ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE i AM HOLDING.
Person: DOES THE SHORTEST THING THE TALLEST PYRAMID'S SUPPORT SUPPORTS SUPPORT ANYTHING GREEN?
Computer: YES, THE GREEN PYRAMID.
Person: WHAT COLOR IS IT?
Computer: BY "IT", i ASSUME YOU MEAN THE SHORTEST THING THE TALLEST PYRAMID'S SUPPORT SUPPORTS. RED
Person: DOES A STEEPLE
Computer interrupting: SORRY, i DON'T KNOW THE WORD "STEEPLE".
Person: A "STEEPLE" IS A STACK WHICH CONTAINS TWO GREEN CUBES AND A PYRAMID.
Computer: i UNDERSTAND.
Person: ARE THERE ANY STEEPLES NOW?
It beats some of today's "personal digital assistants".
We virtually abandoned the idea of having machines understand and speak our language and resorted to the opposite: make humans speak like machines. That is what you do when you talk on the phone with a machine that asks you for numbers; and that is what you do when you talk to your smartphone's "assistant" according to the rules of that assistant. Nothing illustrates the gap between machine intelligence and human intelligence better than comparing how much of a language a toddler
learns in two years and how little of any language all machines ever invented
have learned in more than 60 years.
IBM's Watson, that debuted with much fanfare in 2011 on a quiz show competing against human experts, was actually not capable of understanding the spoken questions: the questions were delivered to Watson as text files, not as spoken questions (a trick which, of course, distorted the whole game).
The most popular search engines are still keyword-based. Progress in search engines has been mainly in indexing and ranking webpages, not in understanding what the user is looking for nor in understanding what the webpage says. Try for example "Hey i had a discussion with a friend about whether Qaddafi wanted to get rid of the US dollar and he was killed because of that" and see what you get (as i write these words, Google returns first of all my own website with the exact words of that sentence and then a series of pages that discuss the assassination of the US ambassador in Libya). Communicating with a search engine is a far (far) cry from
communicating with human beings.
Products that were originally marketed as able to understand natural language, such as SIRI for Apple's iPhone, have bitterly disappointed their users. These products understand only the most elementary of sounds, and only sometimes, just like their ancestors of decades ago. Promising that a device will be able to translate speech on the fly (like Samsung did with its Galaxy S4 in 2013) is a good way to embarrass yourself and to lose credibility among your customers.
The status of natural language processing is well represented by antispam software that is totally incapable of understanding whether an email is spam or not based on its content while we can tell in a split second.
During the 1960s, following
(and mostly reacting against)
Noam Chomsky's "Syntactic Structures" (1957) that heralded a veritable linguistic revolution, a lot work in A.I. was directed towards "understanding" natural-language sentences, notably Charles Fillmore's case grammar at Ohio State University (1967), Roger Schank's conceptual dependency theory at Stanford (1969, later at Yale), William Woods' augmented transition networks at Harvard (1970),
Yorick Wilks' preference semantics at Stanford (1973),
and semantic grammars, an evolution of ATNs by Dick Burton at BBN for one of the first "intelligent tutoring system", Sophie (started in 1973 at UC Irvine by John Seely Brown and Burton). Unfortunately, the results were crude.
Schank and Wilks were emblematic of the revolt against Chomsky's logical approach, that did not work well in computational systems. Schank and Wilks turned to meaning-based approached to natural language processing.
Terry Winograd's SHRDLU and Woods' LUNAR (1973), both based on Woods' theories, were limited to very narrow domains and short sentences.
Roger Schank moved to Yale in 1974 and attacked the Chomsky-ian model that language comprehension is all about grammar and logic thinking. Schank instead viewed language as intertwined with cognition, as Otto Selz and other cognitive psychologists had argued 50 years earlier. Minsky's "frame" and Schank's "script" (all variations on Selz's "schema") assumed a unity of perception, recognition, reasoning, understanding and memory: memory has the passive function of remembering and the active function of predicting; the comprehension of the world and its categorization proceed together; knowledge is stories.
Schank's "conceptual dependency" theory, whose tenet is that two sentences whose meaning is equivalent must have the same representation, aim to replace Noam Chomsky's focus on syntax with a focus on concepts.
We humans use all sorts of complicated sentences, some of them very long, some of them nested into each other.
Little was done in discourse analysis before Eugene Charniak's thesis at the MIT ("Towards a Model of Children's Story Comprehension", 1972), Indian-born Aravind Joshi's "Tree Adjunct Grammars" (1975) at the University of Pennsylvania, and Jerry Hobbs' work at the SRI Intl ("Computational Approach to Discourse Analysis", 1976).
Then a handful of important theses established the field. One originated from the SRI, Barbara Grosz╬Ú╬¸s thesis at UC Berkeley ("The Representation and Use af Focus in a System for Understanding Dialogs", 1977). And two came from Bolt Beranek and Newman, where William Woods had pioneered natural-language processing: Bonnie Webber╬Ú╬¸s thesis at Harvard: ("Inference in an Approach to Discourse Anaphora", 1978) and Candace Sidner╬Ú╬¸s thesis at the MIT ("Towards a Computational Theory of Definite Anaphora Comprehension in English Discourse", 1979).
In 1974 Marvin Minsky at MIT introduced the "frame" for representing a stereotyped situation ("A Framework for Representing Knowledge", 1974) and in 1975 for the same purpose Roger Schank, who had already designed MARGIE (1973, which, believe it or not, stands for "Memory, Analysis, Response Generation, and Inference on English"), in collaboration with Stanford student Chris Riesbeck, and psychologist and social scientist Robert Abelson at Yale introduced the script ("Scripts, Plans, and Knowledge", 1975). Schank's students built a number of systems that used scripts to understand stories: Richard Cullingford's Script Applier Mechanism (SAM) of 1975; Robert Wilensky's PAM (Plan Applier Mechanism) of 1976; Wendy Lehnert's question-answering system QUALM of 1977; Janet Kolodner's CYRUS (Computerized Yale Retrieval and Updating System) of 1978, that learned events in the life of two politicians; Michael Lebowitz's IPP (Integrated Partial Parser) of 1978, that in order to read newspaper stories about international terrorism introduced an extension of the script, the MOP (Memory Organization Packet); Jaime Carbonell's Politics of 1978, that simulated political beliefs; Gerald DeJong's FRUMP (Fast Reading Understanding and Memory Program) of 1979, an evolution of SAM for producing summaries of newspaper stories; BORIS (Better Organized Reading and Inference System) of 1980, developed by Lehnert and her student Michael Dyer, a story-understanding and question-answering system that combined the MOP and a new extension, the Thematic Affect Unit (TAU). Starting in 1978 these systems were grouped under the general heading of "case-based reasoning". Meanwhile, Steven Rosenberg at MIT built a model to understand stories based on Minsky's frames.
In particular, Jaime Carbonell's PhD dissertation at Yale University ("Subjective Understanding", 1979) can be viewed as a precursor of the field that would be called "sentiment analysis".
It is important to realize that, despite the hype and the papers published in reputable (?) A.I. magazines, none of these systems ever worked. They "worked" only in a very narrow domain and they "understood" pretty much only what was hardwired into them by the software engineer. That's why they were never used twice. They were certainly steps forward in theoretical research, but very humble and very short steps. In 2017 Schank published on his blog an angry article titled "The fraudulent claims made by IBM about Watson and A.I." that started out with the sentence "They are not doing cognitive computing no matter how many times they say they are" but perhaps that's precisely what Schank was doing two generations earlier.
These computer scientists, as well as philosophers such as Hans Kamp in the Netherlands (founder of Discourse Representation Theory in 1981), attempted a more holistic approach to understanding "discourse", not just individual sentences; and this resulted in domain-independent systems such as the Core Language Engine, developed in 1988 by Hiyan Alshawi's team at SRI in Britain.
Meanwhile, Melvin Maron's pioneering work on statistical analysis of text
at UC Berkeley ("On Relevance, Probabilistic Indexing, and Information Retrieval", 1960)
was being resurrected by Gerard Salton at Cornell University (the project leader of SMART, System for the Mechanical Analysis and Retrieval of
Text, since 1965). This technique,
true to the motto "You shall know a word by the company it keeps" (1957) by the British linguist John-Rupert Firth,
represented a text as a "bag" of words,
disregarding the order of the words and even the grammatical relationships.
Surprisingly, this method was working better than the complex grammar-based
approaches. It quickly came to be known as the "bag-of-words model" for
language analysis. Technically speaking, it was text classification using naive
Bayes classifiers. In 1998 Thorsten Joachims at Univ of Dortmund replaced the naive Bayes classifier with the method of statistical learning called "Support Vector Machines", invented by Vladimir Vapnik
at Bell Labs in 1995, and other improvements followed. The bag-of-words model became the dominant paradigm for natural language processing but its statistical approach still failed to grasp the
meaning of a sentence.
Nor did it have any idea of why a sentence was where it was and what it did there. Barbara Grosz at SRI International built an influential framework to study the sequence of sentences, i.e. the whole discourse, the "Centering" system ("Providing a Unified Account of Definite Noun Phrases in Discourse", 1983), later refined when she moved to Harvard ("A Framework for Modelling the Local Coherence of Discourse", 1986, but unpublished until 1995).
Perhaps the first major progress in machine translation since Systran was demonstrated in 1973 by Yorick Wilks at Stanford. His system was based on something similar to conceptual dependency, "preference semantics" ("An Artificial Intelligence Approach to Machine Translation", 1973).
The method that did improve the quality of automatic translation is the statistical one, pioneered in the 1980s by Fred Jelinek's team at IBM and first implemented there by Peter Brown's team (the Candide system of 1992). When there are plenty of examples of (human-made) translations, the computer can perform a simple statistical analysis and pick the most likely translation. Note that the computer isn't even trying to understand the sentence: it has no clue whether the sentence is about cheese or parliamentary elections. It has "learned" that those few words in that combination are usually translated in such and such a way by humans. The statistical approach works wonders when there are thousands of (human-made) translations of a sentence, for example between Italian and English. It works awfully when there are fewer, like in the case of Chinese to English.
Back to the Table of Contents
Purchase "Intelligence is not Artificial"