(These are excerpts from my book "Intelligence is not Artificial")
In November 1958, at the Symposium on Mechanization of Thought Processes in England, the always prescient John McCarthy delivered a lecture titled "Programs with Common Sense", that became one of the most influential papers in A.I. McCarthy understood that a machine with no common sense is what we normally call "an idiot". It can certainly do one thing very well, but it cannot be trusted to do it alone, and it certainly cannot be trusted doing anything else.
What we say is not what we mean. If I ask you to cook dinner using whatever high-protein food you can find in a kitchen cabinet, that does not mean that you should cook the spider crawling on its walls, nor the chick that your children have adopted as a pet, nor (gasp) the toddler who is hiding in it for fun.
How do we decide when is the best time to take a picture at an event? A machine can take thousands of pictures, one per second, and maybe even more, but we only take 2 or 3 because those are the meaningful events.
Surveillance cameras and cameras on drones can store millions of hours of videos. They can recognize make and model of a car, and even read its plate number, but they can't realize that a child is drowning in a swimming pool or that a thief is breaking into a car.
In April 2016 in England a group of children spontaneously formed a human arrow on the ground to direct a police helicopter towards the fleeing suspects of a crime. Nobody taught the children to do that. What the children guessed (in a few seconds) is long list of "common sense" knowledge: there has been a crime and we need to capture the criminals; the criminals are running away to avoid capture; the helicopter in the sky is the police looking for the criminals; the police force is the entity in charge of catching criminals; it is good that you help the police if you have seen the criminals flee; it is bad if the criminals escape; the helicopter cannot hear you but can see you if you all group together; the arrow is a universal symbol to mark a direction; helicopters fly faster than humans can run; etc. That's what intelligence does when it has common sense.
Around the same time in 2016 Wei Zexi, a 21-year-old student from Xidian University in China's Shaanxi province, who was undergoing treatment for a rare form of cancer, found an advert on Baidu (China's search engine) publicizing a treatment offered by the Beijing Armed Police Corps No 2 Hospital. The "doctor" turned out to be bogus and the treatment killed the boy. The Chinese media demonized Baidu (and, hopefully, the military hospital!), but this was not a case of Baidu being evil: it was the case of yet another algorithm that has no common sense, just like the Google algorithm that in 2015 thought two African-Americans were gorillas, just like the Microsoft algorithm that in 2016 posted racist and sexist messages on Twitter. This is what intelligence does when it has no common sense.
To make things worse, i found the news of Wei Zexi's death on a website that itself displayed some silly ads. Two of these ads were almost porno in nature (titled "30 Celebs Who Don't Wear Underwear" and "Most Embarrassing Cheerleader Moments"). These ads were posted next to the article describing the tragic death of Wei Zexi: the "intelligent" software that assigns ads to webpages has no common sense, i.e. it cannot understand that it is really disgusting to post such sex-related ads in a page devoted to someone's death. (No, the ads were not customized for me: i was using an Internet-café terminal).
When computers became powerful enough, some A.I. scientists embarked in ambitious attempts to replicate the "common sense" that we humans seem to master so easily as we grow up. The most famous project was Doug Lenat's Cyc (1984), which is still going on. In 1999 Marvin Minsky's pupil Catherine Havasi at the MIT launched Open Mind Common Sense that has been collecting "common sense" provided by thousands of volunteers. DBpedia, started at the Free University of Berlin in 2007, collects knowledge from Wikipedia articles. The goal of these systems is to create a vast catalog of the knowledge that ordinary people have: plants, animals, places, history, celebrities, objects, ideas, etc. For each one we intuitively know what to do: you are supposed to be scared of a tiger, but not of a cat, despite the similarities; umbrellas make sense when it rains or at the beach; clothes are for wearing them; food is for eating; etc. More recently, the very companies that are investing in deep learning have realized that you can't do without common sense. Hence, Microsoft started Satori in 2010 and Google revealed its Knowledge Graph in 2012. By then Knowledge Graph already contained knowledge about 570 million objects via more than 18 billion relationships between objects (Google did not disclose when the project had started). These projects marked a rediscovery of the old program of "knowledge representation" (based on mathematical logic) that has been downplayed too much after the boom in deep learning. Knowledge Graph is a "semantic network", a kind of knowledge representation that was very popular in the 1970s. Google's natural-language processing team, led by Fernando Pereira, is integrating Google's famous deep-learning technology (the "AlphaGo" kind of technology) with linguistic knowledge that is the result of eight years of work by professional linguists.
It is incorrect to say that deep learning is a technique for learning to do what we do. If i do something that has never been done before, deep learning cannot learn how to do it: it needs thousands if not millions of samples in order to learn how to do it. If it is the first time that it has been done, by definition, deep learning cannot learn it: there is only one case. Deep learning is a technique for learning something that humans DID (past tense).
Now let's imagine a scenario in which neural networks have learned everything that humans ever did. What happens next? The short answer is: nothing. These neural networks are incapable of doing anything that they were not trained to do, so this is the end of progress.
Training a neural network to do something that has never been done before is possible (for example, you can just introduce some random redistribution of what it has learned), but then the neural network has to understand that the result of the novel action is interesting, which requires an immense knowledge of the real world. If I perform a number of random actions, most of them will be useless, wastes of time and energy, but maybe one or two will turn out to be useful. We often stumble into interesting actions by accident and realize that we can use those accidental actions for doing something very important. I was looking for a way to water my garden without having to physically walk there, and one day i realized that an old broken hose had so many holes in it that would work really well to water the fruit trees. Minutes ago, i accidentally pressed the wrong key on my Android tablet and discovered a feature that I didn't know it existed. It is actually a useful feature.
In order to understand which novel action is useful, one needs a list of all the things that can possibly be useful to a human being. It is trivial for us to understand what can be useful to human life. It is not trivial for a machine, and certainly not trivial at all for a neural network trained to learn from us.
See for example Alexander Tuzhilin's paper "Usefulness, Novelty, and Integration of Interestingness Measures" (Columbia University, 2002) and Iaakov Exman‘s paper "Interestingness a Unifying Paradigm Bipolar Function Composition" (Israel, 2009).
The importance of common sense in daily activities is intuitive. We get angry whenever someone does something without "thinking". It is not enough to recognize that a car is a car and a tree is a tree. It is also important to understand that cars move and trees don't, that cars get into accidents and some trees bear edible fruits, etc. Deep learning is great for recognizing that a car is a car and a tree is a tree, but it struggles to go beyond recognition. So there is already a big limitation.
A second problem with deep-learning systems is that you need a very large dataset to train them. We humans learn a new game just from listening to a friend's description and from watching friends play it a couple of times. Deep learning requires thousands if not millions of cases before it can play decently.
Big data are used to train the neural networks of deep learning systems, but "big data" is not what we use to train humans. We do exactly the opposite. Children's behavior is "trained" by two parents and maybe a nanny, not by videos found on the Internet. Their education is "trained" by carefully selected teachers who had to get a degree in education, not by the masses. We train workers using the rare experts in the craft, not a random set of workers. We train scientists using a handful of great scientists, not a random set of students.
I am typing these words in 2016 while Egypt and other countries are searching the Mediterranean Sea for an airplane that went missing. In 2014 a Malaysia Airlines airplane en route from Kuala Lumpur to Beijing mysteriously disappeared over the Indian ocean. Deep-learning neural networks can be trained to play go/weichi because there are thousands of well documented games played by human masters, but the same networks cannot be trained to scour the ocean for debris of a missing airplane: we don't have thousands of pictures of debris of missing airplanes. They can have arbitrary shapes, float in arbitrary ways, be partially underwater, etc. Humans can easily identify pieces of an airplane even if they have only seen 10 or 20 airplanes in their life, and never seen the debris of an aircrash; neural networks can only do it if we show them thousands of examples.
A third problem of machines with no common sense is their inability to recognize an "obvious" mistake. Several studies have shown that, in some circumstances, deep-learning neural networks are better than humans at recognizing objects; but, when the neural network makes a mistake, you can tell that it has no common sense: it is usually a mistake that makes us laugh, i.e. a mistake that no idiot would make. You train a neural network using a large set of cat photos. Deep learning is a technique that provides a way to structure the neural network in an optimal way. Once the neural network has learned to recognize a cat, it is supposed to recognize any cat photo that it hasn't seen before. But deep neural networks are not perfect: there is always at least one case (a "blind spot") in which the neural network fails and mistakes the cat for something else. That "blind spot" tells a lot about the importance of common sense. In 2013 a joint research by Google, New York University and UC Berkeley showed that tiny perturbations (invisible to humans) can completely alter the way a neural network classifies the image. The paper written by Christian Szegedy and others was ironically titled "Intriguing Properties Of Neural Networks". Intriguing indeed, because no human would make those mistakes. In fact, no human would notice anything wrong with the "perturbed" images. This is not just a theoretical discussion. If a self-driving car that uses a deep neural network mistakes a pedestrian crossing the street for a whirlwind, there could be serious consequences.
Deep learning depends in an essential way on human expertise. It needs a huge dataset of human-prepared cases in order to "beat" the humans at their game (chess, go/weichi, etc). A world in which humans don't exist (or don't collaborate) would be a difficult place for deep learning. A world in which the expertise is generating by other deep-learning machines would be even tougher. For example, Google's translation software simply learns from all the translations that it can find. If many English-to-Italian human translators over the centuries have translated "table" with "tavolo", it learns to translate "table" into "tavolo". But what if someone injected into the Web thousands of erroneous translations of "table"? Scientists at Google are beginning to grapple with the fact that the dataset of correct translations, which is relentlessly being updated from what Google's "crawlers" find on the web, may degrade rapidly as humans start posting approximate translations made with Google's translation software. If you publish a mistake made by the robot as if it were human knowledge, you fool all the other robots who are trying to learn from human expertise. Today's robots, equipped with deep learning, learn from our experts, not from each other. We learn from experts and by ourselves, i.e by "trial and error" or through a lengthy excruciating research. Robots learn from experts, human experts, the best human experts. Google's translation software is not the best expert in translation. If it starts learning from itself (from its own mediocre translations), it will never improve.
Supervised learning is "learning by imitation", which is as good as the person you are imitating. That's why the generation of AlphaGo is introducing additional tricks. Reinforcement learning, which was the topic of Minsky's PhD thesis in 1954, is a way for the machine to learn more than any of the human experts have learned, because it can play thousands of games against itself while human experts can only play a few each week. Another useful addition to deep learning (also used by AlphaGo) is tree-search, invented by Minsky's mentor Claude Shannon in 1950.
Similar considerations apply to robots. World knowledge is vital to perform ordinary actions. Robot dexterity has greatly improved thanks to a multitude of sensors, motors and processors. But grabbing an object is not only about directing the movement of the hand, but also about controlling it. Grabbing a paper cup is not the same as grabbing a book: the paper cup might collapse if your hand squeezes it too much. And grabbing a paper cup full of water is different from grabbing an empty paper cup: you don't want to spill the water. Moving about an environment requires knowledge about furniture, doors, windows, elevators, etc. The Stanford robot that in 2013 was trained to buy a cup of coffee at the cafeteria upstairs had to learn that a) you don't break the door when you pull down the handle; b) you don't spill coffee on yourself because it would cause a short circuit; c) you don't break the button that calls the elevator; etc; and, as mentioned, that the image in the elevator's mirror is you and you don't need to wait for yourself to come out of the elevator.
We interact with objects all the time, meaning that we know what we can do with any given object.
Your body has a history. The machine needs to know that history in order to navigate the labyrinth of your world and the even more confusing labyrinth of your intentions.
Finally, there are ethical principles. The definition of what constitutes "success" in the real world is not obvious. For example: getting to an appointment in time is "good", but not if this implies running over a few pedestrians; a self-driving car should avoid crashing against walls, unless it is the only way to avoid a child
Most robots have been designed for and deployed in structured environments, such as factories, in which the goal to be achieved does not interfere with ordinary life. But a city street or a home contain much more than simply the tools to achieve a goal.
"Computers are useless: they can only give you answers" (Pablo Picasso, 1964).
Back to the Table of Contents
Purchase "Intelligence is not Artificial"