(These are excerpts from my book "Intelligence is not Artificial")
An Easy Science
When a physicist makes a claim, an entire community of physicists is out there to check that claim. The paper gets published only if it survives peer review, and usually many months after it was written. A discovery is usually accepted only if the experiment can be repeated elsewhere. For example, when OPERA announced particles traveling faster than light, the whole world conspired to disprove them, and eventually it succeeded. It took months of results before CERN accepted that probably (not certainly) the Higgs boson exists.
Artificial Intelligence practitioners, instead, have a much easier life. Whenever they announce a new achievement, it is largely taken at face value by the media and by the A.I. community at large.
If a computer scientist announces that her or his program has learned what a cat looks like by watching videos, the whole world posts enthusiastic headlines even if nobody has actually seen this system in action, and nobody has been able to measure and doublecheck its performance: what else did it recognize? did it recognize the human beings in those videos? did it recognize furniture? what else was in those videos? For the record, when that happened in 2012, the media consistently reported "videos" when in fact the neural network had been trained with "images" taken from videos, i.e. still images, and we were not told who picked the images and according to which criteria out of the tens of thousands of frames that constitute the average YouTube video. It does make a difference which images are fed to the neural network out of the billions of images available on YouTube. A one-minute video contains about 2,000 frames. This neural network was fed 10 million images, which is the equivalent of about 80 hours of video, a pittance compared with the millions of hours of videos available on YouTube.
When in 2012 Google announced that "Our vehicles have now completed more than 300,000 miles of testing" (a mile being 1.6 kilometers for the rest of the world), the
media simply propagated the headline without asking simple questions such as
"in how many months?" or "under which conditions?" or
"on which roads"? "at what time of the day"? Most people now believe that self-driving cars are feasible even though they have never been in one. Many of the same people probably don't believe all the weird consequences of Relativity and Quantum Mechanics, despite the many experiments that confirmed them.
The 2004 DARPA challenge for driverless cars was staged in the desert between Los Angeles and Las Vegas (i.e. with no traffic). The 2007 DARPA urban challenge took place at the George Air Force Base. Interestingly, a few months later two highly educated friends told me that a DARPA challenge took place in downtown Los Angeles in heavy traffic. That never took place. Too often the belief in the feats of A.I. systems feels like the stories of devout people who saw an apparition of a saint and all the evidence you can get is a blurred photo.
In 2005 the media reported that Hod Lipson at Cornell University had unveiled the first "self-assembling machine" (the same scientist in 2007 also unveiled the first "self-aware" robot), and in 2013 the media reported that the "M-blocks" developed at the MIT by Daniela Rus' team were self-constructing machines. Unfortunately, these reports were wild exaggerations.
In May 1997 the IBM supercomputer "Deep Blue", programmed by Feng-hsiung Hsu (who had started building chess-playing programs in 1985 while at Carnegie Mellon University), beat then chess world champion Garry Kasparov in a widely publicized match. What was less publicized is that the match was hardly fair: Deep Blue had been equipped with an enormous amount of information about Kasparov's chess playing, whereas Kasparov knew absolutely nothing of Deep Blue; and during the match IBM engineers kept tweaking Deep Blue with heuristics about Kasparov's moves. Even less publicized were the rematches, in which the IBM programmers were explicitly forbidden to modify the machine in between games. The new more
powerful versions of Deep Blue (renamed Frintz) could beat neither Vladimir
Kramnik, the new world chess
champion, in 2002 nor Kasparov himself in 2003. Both matches ended in a draw.
What is incredible to me is that a machine equipped with virtually an infinite
knowledge of the game and of its opponent, and with lightning-speed circuits
that can process virtually infinite number of moves in a split second cannot
beat a much more rudimentary object such as the human brain equipped with a
very limited and unreliable memory: what does it take for a machine to
outperform humans despite all the technological advantages it has? Divine
intervention? Nonetheless, virtually nobody in the scientific community (let
alone in the mainstream media) questioned the claim that a machine had beaten
the greatest chess player in the world.
If IBM is correct and, as it claimed at the time, Deep Blue could calculate 200 million positions per second whereas Kasparov's brain could only calculate three per second, who is smarter, the one who can become the world's champion with just three calculations per second or the one who needs 200 million calculations per second? If Deep Blue were conscious, it would be wondering "Wow, how can this human being be so intelligent?"
What Deep Blue certainly achieved was to get better at chess than its creators. But that is true of the medieval clock too, capable of keeping the time in a way that no human brain could, and of many other tools and machines.
Finding the most promising move in a game of chess is a lot easier than predicting the score of a Real Madrid vs Barcelona game, something that neither machines nor humans are even remotely close to achieving. The brute force of the fastest computers is enough to win a chess game, but the brute force of the fastest computers is not enough to get a better soccer prediction than, say, the prediction made by a drunk soccer fan in a pub. Ultimately what we are contemplating when a computer beats a chess master is still what amazed the public of the 1950s: the computer's ability to run many calculations at lightning speed, something that no human being can do.
IBM's Watson of 2013 consumes 85,000 Watts compared with the human brain's 20 Watts. (Again: let both the human and the machine run on 20 Watts and see who wins). For the televised match of 2011 with the human experts, Watson was equipped with 200 million pages of information including the whole of Wikipedia; and, in order to be fast, all that knowledge had to be stored on RAM, not on disk storage. The human experts who competed against Watson did not have access to all that
information. Watson was allowed to store 15 petabytes of storage, whereas the
humans were not allowed to browse the web or keep a database handy. De facto
the human experts were not playing against one machine but against a whole army
of machines, enough machines working to master and process all those data. A
fairer match would be to pit Watson against thousands of human experts, chosen
so as to have the same amount of data. And, again, the questions were
conveniently provided to the machine as text files instead of spoken language.
If you use the verb "to understand" the way we normally use it,
Watson never understood a single question. And those were the easiest possible
questions, designed specifically to be brief and unambiguous (unlike the many
ambiguities hidden in ordinary human language). Watson didn't even hear the
questions (they were written to it), let alone understand what the questioner
was asking. Watson was allowed to ring
the bell using a lightning-speed electrical signal, whereas the humans had to
lift the finger and press the button, an action that is order of magnitudes
Over the decades i have personally witnessed several demos of A.I. systems that required the audience to simply watch and listen: only the creator was allowed to operate the system.
Furthermore, some of the most headline-capturing Artificial Intelligence research is supported by philanthropists at private institutions with little or no oversight by academia.
Many of the A.I. systems of the past have never been used outside the lab that created
them. Their use by the industry, in particular, has been virtually nil.
For example, on the first of October of 1999 Science Daily announced: "Machine demonstrates superhuman speech recognition abilities. University of Southern California biomedical engineers have created the world's first machine system that can recognize spoken words better than humans can." It was referring to a neural network trained by Theodore Berger's team. As far as i can tell, that project has been abandoned and it was never used in any practical application.
In October 2011 a Washington Post headline asked "Apple Siri: the next big revolution in how we interact with gadgets?" Meanwhile, this exchange was going viral on social media.
User: Siri, call me an ambulance
Siri: Okay, from now on I'll call you "an ambulance"
(Note: in 2017 the app-measurement firm Verto Analytics estimated that, between the period of May 2016 and May 2017, Siri lost 7.3 million monthly users, or about 15% of its total user base in the USA).
In 2014 the media announced that Vladimir Veselov's and Eugene Demchenko's program Eugene Goostman, which simulated a 13-year-old Ukrainian boy, passed the Turing Test at the Royal Society in London (Washington Post: "A computer just passed the Turing Test in landmark trial"). It makes you wonder what was the I.Q. of the members of the Royal Society, or, at least, of the event organizer, the self-appointed "world's first cyborg" Kevin Warwick, and what was the I.Q. of the journalists who reported his claims. It takes very little ingenuity to fool a "chatbot" impersonating a human being: "How many letters are in the word of the number that follows 4?" Any human being can calculate that 5 follows 4 and contains four letters, but a bot won't know what you are talking about. I can see the bot programmer, who has just read this sentence, frantically coding this question and its answer into the bot, but there are thousands, if not millions, of questions like this one that bots will fail for as long as they don't understand the context. How many words are in this sentence? You just counted them, right? But a bot won't understand the question. Of course, if your Turing Test consists in asking the machine questions whose answers can easily be found on Wikipedia by any idiot, then the machine will easily pass the test.
A video that went viral on social media was a video of a robot folding towels. the video was played at 24 times the real time, and nobody seemed to pay tribute to dry cleaners: it is certainly impressive that a robot can fold towels but dry cleaners already use machines that fold shirts, a more complicated task.
In 2015 both Microsoft and Baidu announced that their image-recognition software was outperforming humans, i.e. that the error rate of the machine was lower than the error rate of the average human being in recognizing objects. The average human error rate is considered to be 5.1%. However, Microsoft's technology that has surfaced (late that year) is CaptionBot, which has become famous not for its usefulness in recognizing scenes but for its silly mistakes that no human
being would make. As for Baidu, its Deep Image system, that ran on the
custom-built supercomputer Minwa (432 core processors and 144 GPUs), has not
been made available to the public as an app. However, Baidu was disqualified
from the most prestigious image-recognition competition in the world (the
ImageNet Competition) for cheating. Recognizing images was supposed to be Google's
specialty but Google Goggles, introduced in 2010, has flopped. I just tried
Goggles again (May 2016). It didn' recognize: towel, toilet paper, faucet, blue
jeans... It recognized only one object: the clock. Officially, Google's image
recognition software has an error rate of 5%. My test shows more like 90% error
rate. In 2015 the Google Photos app tagged two African-Americans as gorillas,
causing accusations of racism when in fact it was just poor technology.
In 2014 the media widely reported that Facebook's DeepFace correctly identified photos in 97.25% of cases, a fact that induced the European Union to warn Facebook that people's privacy must be protected; but, as of 2017, it is still not available for us to try it out. In fact, Facebook stopped publicizing it in 2015, and Facebook still identifies few of my 5,000 friends: face recognition works well only if you have a small number of friends.
I am also confused by all these announcements that seem to contradict each other. In March 2015 Google announced that FaceNet, a 22-layer deep convolutional network, recognized the faces of celebrities with a negligible error rate. Google only released an open-source version of FaceNet that doesn't even come close to what they claimed. In June 2015 Facebook announced that a new algorithm was capable of recognizing partially covered faces with 83% accuracy, but it never clarifying what "partially covered" means, and i have seen no improvements to the error rate in recognizing my friends even when the face is clearly visible.
In September 2017 Apple announced its Face ID technology for facial recognition in the iPhone X smartphone. It took exactly two months for someone (Vietnamese firm Bkav) to find out how to fool the system, and it was simply a matter of creating a mask with some cheap 3D-printed material and a little paint.
Sometimes the claims border on the ridiculous. Let's say that i build an app that asks you to submit the photo of an object, then the picture gets emailed to me, and i email back to you the name of the object: are you impressed by such an app? And, still, countless reviewers marveled at CamFind, the app introduced in 2013 by Los Angeles-based Image Searcher, an app that "recognizes" objects. In most cases it is actually not the app that recognizes objects, but their
huge team in the Philippines that is frantically busy tagging the images
submitted by users.
You've probably seen many headlines like these: "Google's AI translation system is approaching human-level accuracy" (The Verge, 2016); "AI-based translation to soon reach human levels" (Quartz, 2017); "Microsoft researchers match human levels in translating news from Chinese to English" (ZDNet, 2018). In 2018 i was at a conference where the screens on the side of the room where displaying automatic translation (from Chinese to English). The automatic translation was provided by the A.I. system that had just achieved the highest BLEU (Bilingual Evaluation Understudy) score ever. Well, the automatic translation was often hilarious. Luckily there was also human simultaneous translation. At one point a "screening for breast cancer" became "screening for breakfast", and "AI is difficult to program" became "I was difficult to program". Luckily the human translator got all of these right, and that's why i know what they were supposed to be.
In September 2018 China's main firm for speech recognition and machine translation, Xunfei/iFlytek, pretended that its live translation was made by a machine but later its employees spoke out to denounce that it was done by professional interpreters.
(Sure enough, some media reported that the iFlytek system was achieving "super-human" performance).
Remember the automata of centuries ago, that in reality
were people camouflaged like machines?
In 1769, a chess-playing machine called the Turk, created by Wolfgang von Kempelen, toured the world, winning games wherever it: it concealed a man inside so well that it wasn't exposed as an hoax for many years.
(To be fair, Microsoft's CaptionBot is not bad at all: it was criticized by people who expected human-level abilities in the machine, but, realistically, it exceeds my expectations).
Very few people bother to doublecheck the claims of the A.I. community. The media have a vested interest that the story be told (it sells) and the community as a whole has a vested interest that government and donors believe in the discipline's
progress so that more funds will be poured into it.
The media widely reported that neural networks reached parity with humans, and even surpassed humans, in image and speech recognition. The truth is that these facts are measured not in the real world but in the limited world of the images contained in datasets like ImageNet. These neural nets got really good at recognizing the images for which they were trained. They were not trained to recognize objects or faces in the real world. China is using face recognition in multiple locations but you have to stand right in front of the camera, facing the camera (not at an angle), remove your eyeglasses, move your hair out of your forehead, and good luck if you have a bandaid or scar. Neural networks recognize only that for which they have been trained, that is a very tiny fraction of the objects of the world, and only in ideal, perfect conditions. Software has helped police identify criminals, but "helped" does not mean "did". Neural networks recognize the objects that they have been trained to recognize, the objects that were in the training dataset, not "all" the objects in the world, and certainly not all the objects within images and all the situations that include those images.
It is obviously not true that a deep convolutional net is more accurate than humans at recognizing images: it is more accurate only relative to the ImageNet dataset (or some other dataset) that was used to train it. In real life the same net is as good as a drunk in a crowded bar.
After 2014 the much vaunted (by media that are easily manipulated by press releases) Microsoft project Adam, developed by software engineers with little or no background in deep learning, disappeared as quickly as it had appeared despite statements that it was outperforming Google's system (it was actually outperforming an old Google network that Google had already abandoned and it did so at the 22K-category ImageNet contest that virtually nobody was contesting anymore because everybody was focused on the 1K-ImageNet challenge instead).
In 2017 the robot Xiaoyi (developed by Tsinghua University and Chinese startup iFlyTek) passed the medical examination test... or didn't it? The media glossed over the fact that it memorized two million medical records and 400,000 academic papers. This medical robot was incapable of visiting patients: it was just a toy providing a smiling user interface to a database of medical information (a database like many others available online). Xiaoyi was no more than a glorified database to serve China’s rural areas where general practitioners are in severe shortage.
In 2016 the media announced that the Washington Post used an "intelligent" software, "Heliograf", to write articles about the Olympic Games, and that Toutiao in China employed Xiaomingbot, a similar writing robot, to write 450 articles during the 15 days of the games. Both pieces of news were false. Neither "robot" wrote anything meaningful. The "articles" were just a few words long (the longest was 821 words long and it was simply a list of medals) and they were proof-edited by human editors: the Washington Post articles clearly stated "It is powered by Heliograf, the Post's artificial intelligence system" ("powered", not "written"). A human being could have easily written (not just powered) hundreds of such "articles" in one day. But nobody wants to read hundreds of such articles. We'd rather read just a list of medals or one (just one) real article.
In 2017 Cadillac was billing its "Super Cruise" system as the "world's first true hands-free driving system", except that it worked only on Cadillac's mapped routes; which is as "self-driving" as a train riding on rails.
When faced with the question, "What would a self-driving car do if a child suddenly ran in front of the car", a famous A.I. scientist replied "Children rarely appear out of nowhere", which tells you how much we can trust A.I. scientists in the real world. (It also tells you what their solution to the "random children" problem will be: ban children from any road where self-driving cars are allowed, and hold parents liable if their children break this rule).
Before the 2018 World Cup started, many media publicized the prediction made by an Artificial Intelligence program that Spain was going to win, followed by Germany, Brazil, France, Belgium, Argentina, England, Portugal... https://arxiv.org/abs/1806.03208 Well, Germany didn't even make it beyond the first stage; Argentina, Spain, Portugal and Brazil were promptly kicked out too. France did win but the A.I. program had ranked it 4th. That is the same technology that will be used to rank the odds that the object in front of the car is a pedestrian. Of course, the same media that publicized the predictions by the A.I. system did not comment on how grotesquely wrong its predictions turned out to be.
In August 2018 the media widely advertised the fact that OpenAI's bot won against masters of the videogame Dota 2. A lot less advertised was the fact that the bot, OpenAi Five, lost to humans a few days later at The International in Vancouver, an annual tournament for Dota 2. The difference was that this time the rules were more fair instead of handing the machine an unfair advantage (as it has been the norm in this kind of events since DeepBlue's famous chess match). For example, the "hero" lineup of the game was chosen by a third party instead of being chosen by OpenAI.
In 2018 a face-recognition system deployed in the city of Ningbo to spot jaywalkers identified a photo of Chinese billionaire Mingzhu Dong posted an ad on a passing bus as a jaywalker. In 2018 Reddit user “MalletsDarker” used Google Photos' A.I. to merge three photos taken at a ski resort: two photos of the snowy landscape and one photo of his friend wearing helmet and goggle: the A.I. system merged the three into one panorama, understanding his friend’s head as a giant mountain peak. In 2018 Boston Dynamics' humanoid robot Atlas carried out an impeccable show at the Congress of Future Scientists and Technologists but then awkwardly tripped over a curtain and tumbled off the stage. Hopefully unaware that Nazis held similar beliefs about Jews, in 2018 the Israeli startup Faception claimed that its face-analysis system can deduct a person's I.Q. from the person's facial structure.
The media are quick at publicizing any marginal success of A.I., but routinely ignore its vastly more numerous failures.
The media love anecdotes documenting this simple pattern: every time someone said that A.I. could never do something, A.I. did it. A.I. keeps proving the skeptics wrong. This is mostly true. What has not been reported equally well is the converse: whenever someone said that an A.I. system would never make such and such a mistake, the A.I. system made that mistake. In at least one case it also killed the driver who was using the self-driving feature (Joshua Brown, the first person to die in a self-driving car accident, June 2016, while napping in his Tesla) and in at least one case it also killed a pedestrian (an Uber car in March 2018 ran over a woman who was crossing the street in a place where there was no crosswalk).
A.I. is de facto guilty of "fake news": too many announcements about this or that achievement are "fake news" because these machines exist only in the imagination of the press and in the press releases of scientists who are looking for funding and investors. The real achievements are much more limited and humbler. You see a video of a robot grasping an object like a skilled worker but that video has been accelerated 20 times. You see videos of self-driving cars but the video has been made from an angle that hides the human copilot in the car. We have created an entire economy of highly-paid speakers and writers
around a technology that largely doesn't exist yet.
"We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten" (Bill Gates)
Back to the Table of Contents
Purchase "Intelligence is not Artificial"