(These are excerpts from my book "Intelligence is not Artificial")
The Influence of Computer Vision
The history of computer vision began in 1963 with the PhD dissertation of Claude Shannon's student Lawrence Roberts at MIT ("Machine Perception of Three Dimensional Solids," 1963). This was the same Roberts who in 1966 would launch the Arpanet project, later renamed Internet.
Face recognition was pioneered by Woody Bledsoe at his Palo Alto consulting firm Panoramic Research, that mainly worked on military projects ("The Model Method in Facial Recognition", 1964). In 1966 he moved to SRI Intl where he trained Peter Hart, who later worked on Shakey the Robot (and who in 1973 coauthored the textbook "Pattern Classification and Scene Analysis" with Richard Duda).
In 1968 Adolfo Guzman at MIT built programs to detect the constituent objects of a scene ("Computer Recognition of Three-Dimensional Objects in a Visual Scene", 1968). Max Clowes ("On Seeing Things", 1971) at the University of Sussex and David Huffman at UC Santa Cruz ("Impossible Objects as Nonsense Sentences", 1971) independently discovered methods to interpret pictures of polyhedra (solids such as cubes and pyramids), and Alan Mackworth at the University of Sussex developed a program to interpret line drawings as polyhedral scenes ("Interpreting Pictures of Polyhedral Scenes", 1973). Computer vision was mostly about recognizing objects within a picture, and initially the prevailing method was to compare the regions of the picture with templates of typical objects. Martin Fischler and Robert Elschlager at Lockhead's Palo Alto Research Laboratory expanded this method with "stretchable templates" ("The Representation and Matching of Pictorial Structures", 1973). Takeo Kanade graduated from Kyoto University in 1973 with the world's first automated face recognition system ("Picture Processing System by Computer Complex and Recognition of Human Faces", 1973).
By coincidence or not, both Huffman and Kanade were pioneers of mathematical origami, that consists in folding a two-dimensional piece of paper into a three-dimensional object. For example, Takeo Kanade, now at Carnegie Mellon University, published the article "A Theory of Origami World" (1980).
In 1977 General Motors demonstrated Sight-I, a machine-vision system for inspecting integrated circuits, developed by Michael Baird, who at Georgia Institute of Technology had written one of the early theses on machine vision ("A Paradigm for Semantic Picture Recognition," 1973), working in Lothar Rossol's laboratory. In September 1978 Rossol organized at General Motors a symposium on computer vision, possibly the first of its kind in the world). In 1979 his team demonstrated Consight, "a vision-controlled robot system for transferring parts from belt conveyors", based on a simple scheme of light sensors, soon installed at a plant near Niagara Falls. In 1979 SRI's chief A.I. scientist Charles Rosen founded Machine Intelligence Company in Atherton to commercialize SRI's machine-vision technology that could recognize industrial parts as they moved along a conveyor belt. In 1981 Automatix, the new venture of Victor Scheinman (of Stanford Arm and PUMA fame),
co-founded with Philippe Villers of CAD pioneer Computervision,
introduced Robovision for arc welding, the first commercial robot equipped with a vision system, a technology borrowed from SRI's Shakey group.
In 1982 the American Robot Corporation was founded in Pittsburgh and later renamed American Cimflex before merging with A.I. pioneer Teknowledge of Palo Alto. In 1982 General Motors and Japan's Fanuc established the joint venture GMF Robotics.
In 1984 Adept, founded by Brian Carlisle and Bruce Shimano, i.e. the Vicarm engineers who had worked with Scheinman on PUMA, introduced a rival robotic arm also equipped with machine-vision, the first one to incorporate the direct-drive technology invented by Takeo Kanade at Carnegie Mellon University in 1981.
Then David Marr's epochal posthumous book "Vision" (1982) was published. It summarized his research at MIT that the mind understands a scene through a three stage process: a primal two-dimensional sketch that contains the basic components of the scene; a 2.5 dimensional sketch of the scene that also contains depth,
an idea that the British-born Marr originally developed with the Italian-born Tomaso Poggio ("A Theory of Human Stereo Vision", 1977) and was partially preceded by Parvati Dev's neural model at the University of Massachusetts ("Perception of Depth Surfaces in Random-dot Stereograms", 1975);
an the final three-dimensional model. Computer vision became a popular subject. One limitation was that all the classic algorithms only dealt with straight lines.
Jitendra Malik at Stanford ("Interpreting Line Drawings of Curved Objects", 1985) was one of the young scientists who studied how to deal with curved objects. In 1987 Lawrence Sirovich and Michael Kirby, mathematicians at Brown University, used principle component analysis (i.e., linear algebra) to transform images of faces into mathematical vectors called "eigenfaces" ("Low-dimensional Procedure for the Characterization of Human Faces", 1987). This method constituted the basis of the "sliding window" approach of Matthew Turk and Alex Pentland at MIT ("Eigenfaces for Face Detection/Recognition", 1991). In 1987 another cognitive scientist, Irving Biederman at State University of New York in Buffalo, published an influential article to explain how we recognize objects, arguing that objects can be broken down into basic geometric solids called "geons" ("Recognition-by-components, a theory of human image understanding", 1987).
After graduation Malik moved to UC Berkeley, where he founded an important school of computer vision, and one of his students was Pietro Perona, who, in turn, moved to the California Institute of Technology (CalTech) and built another important group in computer vision that over the years refined Perona's "constellation models" for object detection: Thomas Leung ("Finding Faces in Cluttered Scenes Using Labeled Random Graph Matching", 1995), Michael Burl ("Localization via Shape Statistics", 1995), Markus Weber ("Unsupervised Learning of Models for Recognition", 2000), Fei-Fei Li ("A Bayesian Approach to Unsupervised One-Shot learning of Object Categories", 2003), Rob Fergus ("Object Class Recognition by Unsupervised Scale-Invariant Learning", 2003), etc. It was here that in 2003 Fei-Fei Li built the dataset of images Caltech 101 that later evolved into ImageNet.
Back to the Table of Contents
Purchase "Intelligence is not Artificial")