(These are excerpts from my book "Intelligence is not Artificial")
From Recognizing to Creating: Generative Adversarial Networks
Richard Feynman's last words written on his blackboard were: "What i cannot create, i do not understand." Machines may have become good (or, at least, better) at classifying objects in categories (i.e. in recognizing what an object is), but they still lag far behind in drawing an example of a category. There is a difference between recognizing a dog and drawing a dog. If you understood what a dog is, it should be easy for you to sketch what a dog looks like. You have a generative mind: you can classify an object in its category and you can draw a typical object of that category, an object that presumably is not like any specific object that you have seen (unless you are Giotto). In order to implement this "generative" behavior a new approach to machine learning was required.
A recurrent network was used to generate sequences by Hinton's student Ilya Sutskever ("Generating Text with Recurrent Neural Networks", 2011), but recurrent neural networks were clearly limited in their ability to look ahead. In 2014 Hinton's student Alex Graves at the University of Toronto used a LSTM network, more efficient at storing and retrieving information than plain recurrent networs, to generate handwriting. You can enter a text at his webpage "http://www.cs.toronto.edu/~graves/handwriting.cgi" (as of 2017) the system will write it out in human-like handwriting ("Generating Sequences With Recurrent Neural Networks", 2014). This was an important first step.
"Turing Learning" was developed by Roderich Gross at the University of Sheffield: it pits two algorithms against each other, one trying to classify the other while the other is trying to fool the former ("A Coevolutionary Approach to Learn Animal Behavior Through Controlled Interaction", 2013). In a similar fashion in 2014 Ian Goodfellow, one of Bengio's students at the University of Montreal, invented "generative adversarial networks" (GANs), consisting of two neural networks that compete against each other, one trying to fool the other ("Adversarial Examples and Adversarial Training", 2014). Another member of the same lab, Mehdi Mirza, improved the idea with "conditional adversarial nets" ("Conditional Generative Adversarial Nets", 2014). In 2015 Alec Radford at Indico Data Solutions in Boston proved that an expanded version of GAN's can generate perfectly valid images, except that they are not real ("Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", 2015).
GANs contain two independent neural networks that behave as adversaries: one (the "discriminator") tries to correctly classify the real images while the other one (the "generator") produces fake images to fool the former (the "discriminator"); the generator needs to improve its ability to "fake" images while the discriminator needs to improve its ability to discriminate fake ones and real ones. The images produced by the generator are only partially random because they have to resemble the real ones. The generator is trying to fool the discriminator while the discriminator is trying to not get fooled by the generator. As they evolve the respective skills, both tend towards the point where the counterfeits and the originals are indistinguishable. The process trains the discriminator to classify more and more accurately. It also, incidentally, trains the generator to produce highly realistic pictures of imaginary objects, which may represent an art in itself.
The wonders of GANs quickly lured legions of researchers. In 2016 a joint team of the University of Michigan (Honglak Lee) and the Max Planck Institute in Germany (Bernt Schiele and Zeynep Akata) employed GANs to generate images from text descriptions ("Generative Adversarial Text to Image Synthesis", 2016). Antonio Torralba's student Carl Vondrick at MIT employed a GAN to predict the plausible evolution of a scene, i.e. to generate a video. This implies understanding what is going on in the scene and inferring what is reasonable to see happen next ("Generating Videos with Scene Dynamics", 2016). Alexei Efros' students at UC Berkeley (including Jun-Yan Zhu) created a neural network that can turn the picture of a horse into the picture of a zebra using GANs ("Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks", 2017). Then they used "conditional adversarial networks" to develop the "Pix2pix model" capable of generating images from sketches or abstract diagrams ("Image-to-Image Translation with Conditional Adversarial Networks", 2017). When they released the related Pix2pix software, it started a wave of experiments (many of them by professional artists) in creating images: sketch your desired handbag and the system displays what appears to be a real handbag and even colors it. The conditional adversarial network (cGAN) learns how to map an input image onto an output image, or, if you prefer, how to redraw an image with different attributes. Later in 2017 Ming-Yu Liu's team at Nvidia used a slightly different architecture for their image-to-image translation system UNIT (or UNsupervised Image-to-image Translation), i.e. variational autoencoders coupled with generative adversarial networks ("Unsupervised Image-to-Image Translation Networks", 2017).
Trivia. In November 2017 a Reddit user called Deepfakes started a Reddit community /r/deepfakes for creating fake porn videos using a face-swapping software called FakeApp created by another user, Deepfakeapp. This launched the phenomenon of "deepfake videos" in which the face of an actor in a porn video is replaced by the face of a celebrity (you do need a lot of pictures of the victim in order to train the algorithm, hence usually the victims are celebrities).
GANs are probably more interesting as a model of human intelligence than the inventors realized. Competition is one of the key factors in evolution, and, in particular, in the evolution of the brain. Competition often ends up being collaboration: when two adversaries compete, they indirectly help each other improve. They induce a positive feedback loop on their skills. The fundamental case of competition is perhaps the relationship between the two sexes. Charles Darwin in "The Descent of Man and Selection in Relation to Sex" (1871) and Ronald Fisher in "The Genetical Theory of Natural Selection" (1930) already pointed out that sexual selection could greatly accelerate evolution: the female chooses the male and therefore males are pressured to try and be chosen, and as more and more males qualify the female has to become choosier, pressuring the males to further improve, and so on in an endless positive feedback loop. Geoffrey Miller in "The Mating Mind" (2000) went beyond the tail of the peacock and the song of the thrushes. He speculated that language itself, and therefore mind, is created via a feedback loop of this kind: Miller views the human mind not as a problem solver, but as a "sexual ornament". The human brain's creative intelligence must exist for a purpose, and that purpose is not obvious: survival in the environment does not quite require the sophistication of Einstein's science or Michelangelo's paintings or Beethoven's symphonies. On the other hand, these are precisely the kind of things that the human brain does a lot better than other animal brains. The human brain is much more powerful than it needs to be. Miller explains the emergence of art, science and philosophy by thinking not in terms of survival benefits but in terms of reproductive benefits. Sexual selection shapes not only the animal world but also our own mind and our civilizations.
Alas, GANs are very difficult to train. As an alternative to GANs, Thomas Brox's student Alexey Dosovitskiy showed that one can train a convnet to generate images ("Learning to Generate Chairs, Tables and Cars with Convolutional Networks", 2016); and no adversarial training was used by Qifeng Chen at Stanford University to synthesize photorealistic images ("Photographic Image Synthesis with Cascaded Refinement Networks", 2017)
GANs are certainly impressive in the way the can generate realistic images of objects that don't exist. However, computer-based visual effects have been around in the entertainment industry since at least the 1980s, using methods invented by the likes of William Reeves at Lucasfilm (the "particle systems" method of 1983) and Alan Barr at CalTech (the "solid primitives" method of 1984). In 1981 a British firm launched the Quantel Paintbox workstation that quickly revolutionized television graphics. For example, a Quantel Paintbox was used for the visual effects of the video of Dire Straits' "Money For Nothing" (1985). Then, of course, there was Industrial Light & Magic, the visual effects studio founded in 1975 by film director George Lucas that produced the visual effects for the "Star Wars" series (created by Lucas in 1976), the "Indiana Jones" series (created by Steven Spielberg in 1981), the "Back to the Future" series (created by Robert Zemeckis in 1985), Robert Zemeckis's "Who Framed Roger Rabbit" (1988), etc. Another pioneer of computer animation, Pacific Data Images in Silicon Valley, created the morphing visual effects in the video for Michael Jackson's "Black or White" (1991) using a new algorithm developed by Thaddeus Beier of Silicon Graphics and Shawn Neely of Pacific Data Images itself.
Back to the Table of Contents
Purchase "Intelligence is not Artificial")