(These are excerpts from my book "Intelligence is not Artificial")
One of the most powerful attacks against truth came from people who used deep learning to create fake video and fake audio. Matthias Niessner at the University of Erlangen-Nuremberg developed Face2Face, a system (demonstrated in 2016) that captures the facial expression of an actor and maps it into another person's face: you can take a video of your favorite politician and then that person will show your facial expressions as you make them. Face2Face was based on a technique (a variant of "multilinear principal component analysis") that dates from the 1990s. Video Rewrite, one of the first programs for facial reenactment, was developed in 1997 by Malcolm Slaney at Interval Research Corporation, a Palo Alto laboratory founded in 1992 by Microsoft's cofounder Paul Allen and David Liddle. Video Rewrite altered the video of a person talking so that it looked like she was saying something else ("Driving Visual Speech with Audio", 1997). In 1999 Volker Blanz and Thomas Vetter at the Max Planck Institute in Germany had published a method to construct three-dimensional faces from just one single photograph ("A Morphable Model for the Synthesis of 3d Faces", 1999). And in 2009 Paul Debevec's team at the University of Southern California had showed at the SIGGRAPH conference the Digital Emily Project of facial animation, whose original purpose was to create a photo-realistic digital actor. In 2010 Steve Seitz's student Ira Kemelmacher-Shlizerman at the University of Washington demonstrated a project named Being John Malkovich in which facial expressions of actress Cameron Diaz were mapped into facial expressions of actor John Malkovich. This was based on a different approach: seeking images of the target (Malkovich) similar to the images of the source (Diaz). For the record, David Fincher's "The Curious Case of Benjamin Button" (2008) was the first Hollywood film to feature a photo-realistic computer-generated human character (an aged version of Brad Pitt). Paul Debevec helped Hollywood studios complete the movie “Fast and Furious 7” (2015) even though the actor Paul Walker had died partway into filming; and then helped the actor Peter Cushing, dead for 20 years, to reprise his role as Grand Moff Tarkin in a new episode of the “Star Wars” saga, “Rogue One” (2016).
The technique was expanded beyond facial expression by Christian Theobalt's team at the Max Planck Institute, whose generative neural network produced re-animations of head position, head rotation, face expression, eye gaze, and eye blinking ("Deep Video Portraits", 2018).
Voice impersonation became popular in 2015 when Nitesh Saxena's group at the University of Alabama demonstrated a system capable of creating a speech with someone's voice after simply listening to a few minutes of this person speaking. For example, you can use a recording of one of my public talks to create a message in my voice, and good enough to fool the biometric authentication system of my smartphone ("Stealing Voices to Fool Humans and Machines", 2015). This system employed the Festvox voice converter built on top of the Festival Speech Synthesis System developed by Alan Black at the University of Edinburgh and first published in 1997 (and later maintained by Black at Carnegie Mellon University). Similarly, DeepMind's deep neural network WaveNet and Adobe's Voco, both demonstrated in 2016, could make someone say on video something that she never said. In 2017 the Canadian startup Lyrebird, founded by former students of Yoshua Bengio's Montreal Institute for Learning Algorithms (MILA), improved voice impersonation to the point that the system could be trained with just one minute of the victim's voice. These systems can fake your voice and create audiobooks that sound like they are read by you. In 2017 Supasorn Suwajanakorn of Steve Seitz's group demonstrated the Synthesizing Obama system that produced videos of president Barack Obama speaking his own words with accurate lip synchronization: the system picked images of Obama to match the audio. Such a system would be able to generate a video of a person speaking his own words given the audio and a dataset of images of that person. For example, one could generate the video of an Albert Einstein talk for which there is no video. Alas, an impersonator capable of imitating president Obama's voice could also generate a convincing video of president Obama saying something he never said.
Slowly, voice-morphing technology is being combined with face-morphing technology to create completely fake speeches.
In November 2017 a Reddit user nicknamed “Deepfakes” started a Reddit community /r/deepfakes for creating fake porn videos, mostly using a face-swapping program called FakeApp created by another user, Deepfakeapp, which was based on TensorFlow (the first large-scale demonstration of the power of an A.I. platform for developing practical applications). This launched the phenomenon of "deepfake videos" in which the face of an actor in a porn video is replaced by the face of a celebrity (you do need a lot of pictures of the victim in order to train the algorithm; hence usually the victims are celebrities). Within a few months Reddit had to intervene to stop the face-swapping celebrity porn craze.
In 2019 Sean Vasquez and Mike Lewis at Facebook AI used videos to make their MelNet imitate voices of famous people.
"Humanity is acquiring all the right technology for all the wrong reasons" (Buckminster Fuller)
Back to the Table of Contents
Purchase "Intelligence is not Artificial")