(These are excerpts from my book "Intelligence is not Artificial")
Why we need A.I. or The Robots are Coming- Part 2: The Near Future of A.I. or Don't be Afraid of the Machine
The media are promising a myriad applications of A.I. in all sectors of the economy. So far we have seen very little compared with what it has been promised. In 2016 Bloomberg estimated 2,600 startups working on A.I. technology, but IDC calculated that sales for all companies selling A.I. software barely totaled $1 billion in 2015. There is a lot of talk, but, so far, very few actual products that people are willing to pay for.
The number-one application of A.I. is and will remain drum roll making you buy things that you don't need. All major websites employ some simple form of A.I. to follow you, study you, understand you and then sell you something. Your private life is a business opportunity for them and A.I. helps them figure out how to monetize it. The founders of A.I. are probably turning in their graves.
And sometimes these "things" can even kill you (the case of Wei Zexi in 2016, who was induced by an advert posted on Baidu to buy the cancer treatment that killed him).
Mark Weiser famously wrote: "The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it" ("The Computer for the 21st Century", 1991). Unfortunately, it turned out to be a prophecy about the ubiquitous "intelligent" agents that make us buy things.
Perhaps the most sophisticated (or, at least, widely used) A.I. system since 2014 is Facebook's machine-learning system FBLearner Flow, designed by Hussein Mehanna's team, that runs on a cluster of thousands of machines. It is used in every part of Facebook for quickly training and deploying neural networks. Neural networks can be fine-tuned by playing with several parameters. Optimizing these parameters is not trivial. It requires a lot of "trial and error". But even just a 1% improvement in machine-learning accuracy can mean billions of dollars of additional revenues for Facebook. So Facebook is now developing Asimo, that performs thousands of tests to find the best parameters for each neural network. In other words, Asimo does the job that is normally done by the engineers who build the deep-learning system.
While Jeff Hammerbacher's lament remains true, we must recognize that progress in deep learning has been driven by funding from companies like Google and Facebook whose main business interest is to convince people to buy things. If the world banned advertising from the Web, the discipline of deep learning would probably return to the obscure laboratories of the universities where it came from.
Remember Marshall McLuhan's comment in "Understanding Media" (1964) that "Far more thought and care go into the composition of any prominent ad in a newspaper or magazine than go into the writing of their features and editorials"? The same can be said today: far more thought and care has been invested in designing algorithms that make you buy things when you are reading something on the Web than in the writing that you are reading.
The next generation of "conversational" agents will be able to access a broader range of information and of apps, and therefore provide the answer to more complicated questions; but they are not conversational at all: they simply query databases and return the result in your language. They add a speech-recognition systems and a speech-generation system to the traditional database management system.
There are actually "dream" applications for deep learning. Health care is always at the top of the list because its impact on ordinary people can be significant. The medical world produces millions of images every year: X-Rays, MRIs, Computed Tomography (CT) scans, etc. In 2016 Philips Health Care estimated that it manages 135 billion medical images, and it adds 2 million new images every week. These images are typically viewed by only one physician, the physician who ordered them; and only once. This physician may not realize that the image contains valuable information about something outside the specific disease for which it was ordered. There might be scientific discoveries that affect millions of those images, but there is nobody checking them against the latest scientific announcements. First of all, we would like deep learning to help radiology, cardiology and oncology departments to understand all their images in real time. And then we would like to see the equivalent of a Googlebot (the "crawler" that Google uses to scan all the webpages of the world) for medical images. Imagine a Googlebot for medical images that continuously scans Philips' database and carries out a thorough analysis of each medical image utilizing the latest updates on medical science. Enlitic in San Francisco, Stanford's spinoff Arterys, and Israel's Zebra Medical Vision are the pioneers, but their solutions are very ad-hoc. A medical artificial intelligence would know your laboratory tests of 20 years ago and would know the lab tests of millions of other people, and would be able to draw inferences that no doctor can draw.
In 2016 Sebastian Thrun's team at Stanford built a neural network capable of recognizing skin cancer with the accuracy of a dermatologist. In 2016 radiologist Luke Oakden-Rayner of the University of Adelaide in Australia demonstrated a deep-learning system that estimated a person's longevity based on radiological chest images of people aged 60 and over. Most early symptoms of heart attacks, cancer and diabetes (diseases that kill millions of people every year) are visible in these images but it takes a trained specialist. At the end of 2016 Varun Gulshan and physician Lily Peng of Google trained a (deep convolutional) neural network (using a dataset of 128,175 retinal images) to identify retinas at risk of a diabetic disease that causes 5% of blindness worldwide and tested this neural network against a group of expert ophthalmologists, showing that the accuracy was virtually the same. In 2017 the South Korean scientists Hongyoon Choi and Kyong Hwan Jin used a neural network to scan brain images and identify people likely to get Alzheimer's disease within the next three years. Alzheimer's disease affects 30 million people. In 2017 Stephen Weng at the University of Nottingham unveiled a neural network, trained from hundreds of thousands of medical records, that proved to be better than human experts at predicting heart attacks. Every year about 20 million people die of a cardiovascular disease: this neural network could save the lives of millions of people. The success stories of medical-image analysis keep coming in.
In 2015 Joel Dudley's team at Mount Sinai Hospital in New York trained a deep-learning system called Deep Patient on the hospital's large dataset of health records about more than 700,000 patients. Deep Patient can discover patterns in patient data that are not easily spotted by human experts, and it has proven capable of predicting diseases, especially psychiatric disorders. In April 2016 a neural network trained Harvard pathologist Andy Beck and MIT computer scientist Aditya Khosla competed with an expert pathologist at identifying cancer and narrowly lost (it soon became the flagship product of the authors' new startup PathAI).
There are at least three problems though. The first one is that, as usual, it is difficult if not impossible to replicate the results. As of 2017, Google's dataset has not been released to the public so noone else can replicate the experiment. The dataset of medical images is almost always imbalanced: it mostly contains data about sick people. It is not difficult to build a neural network that will recognize the "positives" (the medical images that signal a disease), but it is difficult to make sure that the neural network will NOT recognize as positive the healthy person. Finally, deep learning has proven to work well only with small images. Medical images are giant images. In these experiments only a tiny fraction of the pixels was used. Quoting from Luke Oakden-Rayner's blog: "... retinal photographs are typically between 1.3 and 3.5 megapixels in resolution... these images were shrunk to 299 pixels square, which is 0.08 megapixels..." Each pixel can increase the dimensionality of the neural network to a degree that defies existing computational theories. In other words, we don't know if these methods work with the real images or just with toy miniatures of the real images.
The results of your neural network are only as good as your training data.
In 2015 the USA launched the Precision Medicine Initiative that consists in collecting and studying the genomes of one million people and then matching those genetic data with their health, so that physicians can deliver the right medicines in the right dose to each individual. This project will be virtually impossible without the use of machines that can identify patterns in that vast database.
There are also disturbing applications of the same technology that are likely to spread. The smartphone app FindFace, developed by two Russian kids in their 20s, Artem Kukharenko and Alexander Kabakov, identifies strangers in pictures by searching pictures posted on social media. If you have a presence on social media, the user of something like FindFace can find out who you are by simply taking a picture of you. In 2016 Apple acquired Emotient, a spinoff of UC San Diego, that is working on software to detect your mood based on your facial expression.
An example of unreasonable expectations is Google's self-driving car. The project was launched in 2009 by Sebastian Thrun, the Stanford scientist who had won the DARPA "Grand Challenge" of 2005, a 212-km race across the Nevada desert. Thrun quit in 2013 and was replaced by Chris Urmson, formerly a Carnegie Mellon University student who in 2007 had worked on William Whittaker's victorious team for the DARPA "Urban Challenge" held at George Air Force Base near Los Angeles. (For the record, Chris Urmson left Google in 2016, as had done most of the original team).
The self-driving car may never fully materialize, but the "driver assistant" is coming soon. Mobileye, the Israeli company founded in 1999 that is widely considered the leader in machine-vision technology (and that does not use deep learning) has a much more realistic strategy based on incremental steps to introduce Advanced Driver Assistance Systems (ADAS) that can assist (not replace) drivrs. Otto, founded by one of the engineers who worked on Google's self-driving car, Anthony Levandowski, does not plan to replace the truck driver but to assist the truck driver, especially on long highway drives. Otto, which in 2016 was acquired by Uber, does not plan to build a brand new kind of truck, but to provide a piece of equipment that can be installed on every truck. In 2014 a total of 3,660 people died in the USA in accidents that involved large trucks.
The need for robots is even greater. There are dangerous jobs in construction and steel work that kill thousands of workers every year. According to the International Labor Organization, mining accidents kill more than 10,000 miners every year; and that number does not include all the miners whose life expectancy is greatly reduced by their job conditions.
Robots and drones need eyes to see and avoid obstacles. There will be a market for computer-vision chips that you can install in your home-made drone, and there will be a market for collision-avoidance technology to install in existing cars. Israel's Mobileye and Ireland's Movidius have been selling computer-vision add-ons for machines for more than a decade.
We also need machines to take care of an increasingly elderly population. The combination of rising life expectancy and declining fertility rates is completely reshaping society. The most pressing problem of every country used to be the well-being and the education of children. That was when the median age was 25 or even lower. Ethiopia has a median age of about 19 like most of tropical Africa. Pakistan has a median age of 21. But the median age in Japan and Germany is 46. This means that there are as many people over 46 as there are under 46. Remove teenagers and children: Japan and Germany don't have enough people to take care of people over 46. That number goes up every year. There are more than one million people in Japan who are 90 years old or older, of which 60,000 are centenarians. In 2014, already 18% of the population of the European Union was over 65 years old, almost ten million people. We don't have enough young people to take care of so many elderly people, and it would be economically senseless to use too many young people on such an unproductive task. We need robots to help elderly people do their exercise, to remind them of taking medicines, to pick up packages at the front door for them, etc.
I am not afraid of robots. I am afraid that robots will not come soon enough.
"Instead of worrying about what machines can do, we should worry more about what they still cannot do." (World chess champion Garry Kasparov)
The robots that we have today can hardly help. Using an IDC report of 2015, we estimated that about 63% of all robots are industrial robots, with robotic assistants (mostly for surgery), military robots and home appliances (like Roomba) sharing the rest in roughly equal slices. The main robot manufacturers, like ABB (Switzerland), Kuka (Germany, being acquired by China's Midea in 2016) and the four big Japanese companies (Fanuc, Yaskawa, Epson and Kawasaki), are selling mostly or only industrial robots, and not very intelligent ones. Robots that don't work on the assembly line are a rarity. Mobile robots are a rarity. Robots with computer vision are a rarity. Robots with speech recognition are a rarity. In other words, it is virtually impossible today to buy an autonomous robot that can help humans in any significant way other than inside the very controlled environment of the factory or of the warehouse. Nao (developed by Bruno Maisonnier's Aldebaran in France and first released in 2008), RoboThespian (developed by Will Jackson's Engineered Arts in Britain since 2005, and originally designed to be an actor), the open-source iCub (developed by the Italian Institute of Technology and first released in 2008), Pepper (developed by Aldebaran for Japan's SoftBank and first demonstrated in 2014) and the autonomous robots of the Willow Garage "diaspora" (Savioke, Suitable, Simbe, etc) are the vanguard of the "service robot" that can welcome you in a hotel or serve you a meal at the restaurant: "user-friendly" humanoid robots for social interaction, communication and entertainment at public events. In 2016 Knightscope's K5 robot security guard worked in the garage of the Stanford Shopping Center; Savioke's Botlr delivered items to guests at the Aloft hotel in Cupertino;
Lowe's superstore in Sunnyvale employed an inventory checker robot built by Bossa Nova Robotics;
and Simbe's Tally checked shelves of a Target store in San Francisco. But these are closer to novelty toys than to artificial intelligence. A dog is still a much more useful companion for an elderly person than the most sophisticated robot ever built.
The most used robot in the home is iRoomba, a small cylindrical box that vacuums floors. Not exactly the tentacular monster depicted in Hollywood movies. Unfortunately, it will also vacuum money if you drop it on the floor: we cannot trust machines with no common sense, even for the most trivial of tasks.
An industry that stands to benefit greatly from the "rise of the robots" is the toy industry. In 2016 San Francisco-based startup Anki introduced Cozmo, a robot with "character and personality". That's the future of toys, especially in countries like China where the one-child policy has created a generation of lonely children. In fact, we have already been invaded by robots: there are millions of Robosapien robots. The humanoid Robosapien robot was designed by Mark Tilden, a highly respected inventor who used to work at the Los Alamos National Laboratory, and introduced in 2004 by Hong Kong-based WowWee (a company founded in the 1980s by two Canadian immigrants). Most robots will be an evolution of Pinocchio, not of Shakey.
If you consider them robots, the exoskeletons are a success story. These are basically robots that you can wear. The technology was originally developed by the DARPA to help soldiers carry heavy loads, but it is now used to help victims of brain injuries and spinal-cord injuries in several rehabilitation clinics.
ReWalk, founded by an Israeli quadriplegic (Amit Goffer), Ekso Bionics and Suitx (two UC Berkeley spinoffs) and SuperFlex (an SRI spinoff) already helped paraplegics or seniors walk. Panasonic's ActiveLink has announced an exoskeleton that will help weak nerdy people like me with manual labor that requires physical strength. The cost is still prohibitively high, but one can envision a not-too-distant future in which we will be able to rent an exoskeleton at the hardware story to carry out gardening and home-improvement projects. After you wear it, you can lift weights and hammer with full strength.
Robots can pick up only a very limited set of objects, and sometimes only one specific kind of object. In 2015 a member of RoboBrain, Stefanie Tellex of Brown University, demonstrated how her robot was trained by another robot to manipulate an object. The knowledge required was passed over the cloud from one robot to the other. She then launched the "Million Object Challenge" to build a knowledge-base of manipulation experiences that can be reused by any robot.
"Cloud Robotics" is a term coinded in 2010 by James Kuffner at Carnegie Mellon University. The idea is to create a library of programs that can we executed remotely by any robot, a "skills library"; basically, removing most of the brain of the robot and enabling the robot to use a common brain. In 1993 Masayuki Inaba at the University of Tokyo explored the concept of such "remote-brained robots", but that was before cloud computing became affordable.
Robots can take advantage of projects such as OpenEase, a platform for machines to share knowledge, or RoboEarth (2010) and Rapyuta (2013), funded by the European Union. RoboHow (2012) wants robots to learn new tasks from high-level descriptions; and RoboBrain (2014) wants robots to learn new tasks from human demonstrations and advice.
The immediate effect is to turn the robot into the equivalent of a "thin" client. Benefits include a longer battery life and no need to download software updates. But the bigger benefit is that cloud-enabled robots can engage in collective progress, learning rapidly from each other's experiences.
For example, in 2010 a group of makers in Silicon Valley led by
Ryan Hickman, who later founded the Cloud Robotics team at Google,
started the Cellbots project to build robots out of smartphones and spare
parts. The clever idea was to realize that a smartphone is almost a robot:
it already has touch, hearing, vision, speech, navigation, even sense that we don't have like real-time translation, and is accessing
the cloud all the time. It only lacks mobility, i.e. legs or wheels.
Hickman ran cables between an Android phone and an Arduino platform, mounted
it on wheels and obtained a "cellbot".
In fact, object recognition and grasping algorithms can all be perform in the cloud. The robot itself can be just a brainless body.
In 2017 Ken Goldberg at UC Berkeley (also a pioneer of telerobotic art) found another way to achieve a faster way of training of robots: train them in virtual reality. First his team created a database of thousands of 3D models, DexNet 1.0 (2015). Then the robot was trained by practicing to grasp those virtual objects in a simulated world.
The idea of using virtual reality to train robots was also behind Canadian startup Kindred.ai, the brainchild of Suzanne Gildert, a senior researcher at
quantum-computer maker D-Wave.
Others are studying how a team of robots can self-organize in order to collaborate and achieve a goal; i.e. collective artificial intelligence. The pioneer is Marco Dorigo in Belgium, who in 1999 developed the “ant colony optimization” algorithm and then applied it to “swarmanoids” in his Ph.D. dissertation at Milan’s Polytechnic Institute in Italy (1992). Then came the Alices of Simon Garnier at the New Jersey Institute of Technology ("Alice in Pheromone Land”, 2007), the “kilobot” swarms by Radhika Nagpal's group at Harvard University (“A Low Cost Scalable Robot System for Collective Behaviors", 2012), the tiny robots at the University of Colorado by the group of Nikolaus Correll, a former student of Daniela Rus at MIT ("Modeling Multi-Robot Task Allocation with Limited Information as Global Game", 2016), the "smarticles", or smart active particles, by Dana Randall and Daniel Goldman at Georgia Tech ("A Markov Chain Algorithm for Compression in Self-Organizing Particle Systems", 2016). There are also two international conferences on swarm intelligence, ICSI (first held at Peking University in China in 2010) and ANTS (started in 1998 at IRIDIA in Belgium). Studies on the self-organizing skills of ants by myrmecologists such as Deborah Gordon at Stanford (“Ants at Work”, 1999) and Guy Theraulaz in France (“Spatial Patterns in Ant Colonies”, 2002) were particularly influential on this school of thought.
But first we will need to build robotic arms whose dexterity matches at least the dexterity of a squirrel.
Our hand has dozens of degrees of freedom. Let's say that it has ten (it actually has many more). I can plan the movement of my hand easily ten steps ahead: that's 10 to the 10th to the 10th to... a very huge number. And i can do it without thinking, in a split second. For a robot this is a colossal computational problem.
Picking arbitrary objects is not trivial because objects, even of the same category, may come in an infinite number of variations. For example, there are thousands of kinds of mugs: each one requires a slightly different grasping movement. Even the orientation of the mug causes a change in the way we grasp it.
Grasping an object is something very easy and natural for humans, but terribly difficult to understand (even for humans) and therefore to implement in robots. Traditionally, robot grasping has been implemented through a combination of perception (estimating the position and orientation of the object) and planning (calculating the optimal movement of the robot's arm and hand). In this way the problem of grasping is reduced to a combination of geometric and kinematic considerations. This "analytical" approach works only in ideal conditions and for known objects (objects for which a 3D model is available). It tends to fail in cluttered environments and for objects never seen before. The "data-driven" approach, instead, infers a grasp for an object without knowing its exact shape, using experience that, typically, comes from simulations. This approach became popular after Peter Allen's student Andrew Miller at Columbia University developed the GraspIt simulator ("Automatic Grasp Planning Using Shape Primitives," 2003), which was made available to the larger community in 2004.
Andrew Ng's student Ashutosh Saxena at Stanford University augmented the data-driven approach with machine learning in order to grasp an object seen for the first time through computer vision. He specifically envisioned a household scenario in which a robot has to empty a dishwasher ("Robotic Grasping of Novel Objects using Vision", 2006). Similar methods were developed by
Peter Allen's student Hao Dang at Columbia University ("Semantic Grasping", 2012), who trained his neural network on a large dataset of objects and designed it to generalize to unseen objects,
by Markus Vincze's student David Fischinger at Vienna University of Technology in Austria, in collaboration with Ashutosh Saxena's student Yun Jiang, after Saxena moved to Cornell University ("Learning Grasps for Unknown Objects in Cluttered Scenes", 2013), and by Joseph Redmon at University of Washington (of "You Only Look Once" fame) in collaboration with Anelia Angelova of Google ("Real-time Grasp Detection Using Convolutional Neural Networks", 2015).
These groups pioneered the application of deep learning to improving the dexterity of robots.
Meanwhile, new devices for 3D sensing became commercially available, such as Microsoft's Kinect (2010) for depth sensing, and new software methods were invented, notably one by Kurt Konolige at Willow Garage ("Projected Texture Stereo", 2010).
The data-driven approach became predominant in the era of big data, as shown by Daniel Kappler, Jeannette Bohg and Stefan Schaal at the Max-Planck Institute ("Leveraging Big Data for Grasp Planning", 2015).
Robert Platt's student Andreas ten Pas at Northeastern University designed a hybrid of the analytical and data-driven approaches, augmented with machine learning ("Using Geometry to Detect Grasp Poses in 3d Point Clouds," 2015).
Some continued the tradition of supervised-learning approaches, for example
Ashutosh Saxena's student Ian Lenz ("Deep Learning for Detecting Robotic Grasps", 2015), that used human supervision, and
Ken Goldberg's student Jeffrey Mahler at UC Berkeley, who introduced Dex-net 2.0
("Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics", 2017);
while others opted for novel self-supervised approaches, notably
Abhinav Gupta's student Lerrel Pinto at CMU ("Supersizing Self-supervision", 2015) and
a Google team led by Sergey Levine ("End-to-end Learning of Semantic Grasping", 2017) that mimicked the "two-stream" model of visual reasoning advanced in 1992 by the neuroscientists David Milner and Melvyn Goodale at the University of Western Ontario in Canada (a "ventral stream" that recognizes the kind of object plus a parallel "dorsal stream" that recognizes the object's location relative to the viewer).
Both approaches relied heavily on data. Hence a number of teams became to
generate simulated data to train the neural network, just like in Jeffrey Mahler's Dex-net. Notably:
a Google team led by Vincent Vanhoucke that included Kurt Konolige and Sergey Levine and that implemented GraspGAN ("Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping", 2017);
Robert Platt's student Ulrich Viereck at Northeastern University ("Learning a Visuomotor Controller for Real World Robotic Grasping Using Easily Simulated Depth Images", 2017);
and Silvio Savarese's student Kuan Fang at Stanford, who introduced the Task-Oriented Grasping Network or TOG-Net ("Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision", 2018).
All these systems relied on large-scale simulated self-supervision, i.e. trained their neural networks in a simulated environment and then transferred it to the real robot.
The studies on grasping yielded interesting insights on other aspects of human cognition. For example, Keng Peng Tee's team at Singapore's Agency for Science, Technology, and Research (A*STAR), in collaboration with Gowrishankar Ganesh's team at Japan's National Institute of Advanced Industrial Science and Technology (AIST), discovered an algorithm that can automatically recognize a novel object as a potential tool and can figure out how to use it ("Towards Emergence of Tool Use in Robots", 2018).
Then came the age of AlphaGo, i.e. of deep reinforcement learning.
Robert Platt's student Marcus Gualtieri at Northeastern University used deep reinforcement learning (a learning agent similar to DQN) to demonstrate versatile manipulation of objects ("Learning 6-DoF Grasping and Pick-Place Using Attention Focus", 2018).
A joint Google-UC Berkeley team led by Sergey Levine overcame the limitations of DDPG and NFQCA with QT-Opt ("Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation", 2018) that achieved a 96% grasp success rate picking and grasping different shaped objects. It was basically DQN on steroids (trained using multiple robots at the same time).
Lucas Manuelli and Wei Gao at MIT, instead developed a method called kPAM to model categories of objects with just a handful of "keypoints": three for mugs, six for shoes ("KeyPoint Affordances for Category-Level Robotic Manipulation", 2019).
"High-level reasoning requires very little computation, but low-level sensorimotor skills require enormous computational resources" (Erik Brynjolfsson)
Earlier in the book i mentioned that two of the motivations for doing A.I. were: a business opportunity and the ideal of improving the lives of ordinary people. Both motivations are at work in these projects. Unfortunately, the technology is still primitive. Don't even think for a second that this very limited technology can create an evil race of robots any time soon.
"Nothing in life is to be feared, it is only to be understood" (Marie Curie).
Back to the Table of Contents
Purchase "Intelligence is not Artificial"