If you have ever seen a dubbed foreign film or TV series, you have probably realized that the lips and voice are sometimes disconnected. This can often break the immersion and be a “dealbreaker” for fans of foreign content, usually meaning viewers would rather read subtitles than listen to choppy voice transposition. This animosity towards low-quality voice dubbing can be costly for films with a large target audience, most evidently, animated kids’ movies. Since most kids have a hard time reading subtitles fast enough, dubbing is the only option for these movies to be successful. To dub these movies in different languages, there were only two options: Either painstakingly alter the animations, frame-by-frame, for each language, or just overlay the new voices without changing the video at all. Unfortunately, since frame-by-frame tweaking was so expensive, the latter method became commonplace. This seeming lack of effort often reflected poorly on the content and deterred and alienated foreign moviegoers and TV watchers. However, with the advancement in AI and neural networks, foreign content may be able to adapt to foreign languages seamlessly. Some AIs are almost perfectly able to match any audio to any set or lips, even adding shading, so the changes are practically imperceptible1. But as the technology is getting more advanced, some people are getting worried.
In the movie Forrest Gump, the titular character can be seen meeting with presidents like Kennedy and Nixon. In the 90’s, something like this was incredible for its time. Each frame was meticulously constructed so that the immersion was still there; it felt like Forrest Gump was real and he was actually meeting the president. Interestingly enough, incredible scenes like these may become commonplace with the use of ‘Deep Fakes’.
Video from Forrest Gump. Credit: Paramount Pictures Corporation
Deep Fakes are altered videos that use neural networks to overlay someone’s face, including lips and facial expressions, onto someone else’s face1. These can be used to make voice dubbing more fluid and realistic or could even make it look like someone did something when they actually didn’t. This ability to make virtually anyone “do” virtually anything could be problematic. Some people are afraid this technology could be used to incriminate an innocent person or that it could call the integrity of all video evidence into question.
The surprising thing is that this technology isn’t all that new. This technology is just a more advanced version of the algorithms that Snapchat filters take advantage of. The first instance of modern facial recognition was in 2001 with the conception of the Viola-Jones Optical Recognition Framework2. By analyzing 100’s of pictures, it was able to accurately identify faces based on color gradients2. This is the same fundamental way that digital cameras are able to recognize faces while shooting photos or videos2.
Interestingly enough, Deep Fake technology can be used by just about anyone. There is a program called FakeApp that is allowing just about anyone to use and make their own Deep Fakes. All that is needed is a video, then the program will get to work, swapping the face with just about any celebrity you could think of. With this technology, scenes like the one above from Forrest Gump could be created in less than an hour and for free.
At the University of Washington, huge strides in Deep Fake technology are being made. By using high-quality, stock videos of President Barack Obama, an accurate recreation of him can be made1. By using 14 hours of stock footage from interviews, press releases, and other miscellaneous sources, the UW team was able to create an extremely realistic AI Obama1. If you have audio of Obama saying something you can accurately lip sync it with the neural network; the neural network can even account for changes in lighting1. This technology could have been used to edit and rearrange segments of campaign speeches or other online media featuring the president without the need of cuts.
Unfortunately, these advancements are just as easy to harness for malicious purposes, making some afraid of their possible deleterious applications. Some of the most vulnerable targets for Deep Fake abuse are politicians who can sometimes have 1000’s of unique, high-quality videos which can be used to create impeccable Deep Fakes3. Furthermore, with the power that many politicians wield, this impersonation could be disastrous: What if someone pretended to be Kim Jong Un, and started declaring war on South Korea? Or what if there was a real declaration of war from a real politician but it was dismissed because people thought it was a deep fake?
Furthermore, it is challenging to tell whether or not a video is real or a deep fake. If you have the original video, you may be able to analyze different compression patterns to determine whether or not they have been altered3-4. However, as you download and re-upload the video, specific colors and pixels can get distorted so unless it is the original video, this method is often ineffective4. The only other conceivable way of identifying real from fake is to find errors. If you go frame-to-frame, you may be able to see small mistakes or find moments where the face moves abnormally, but this requires a considerable amount of time as it requires you to analyze each frame. Thankfully, there is a more holistic approach that can be used. Since Deep Fakes don’t have access to much time analyzing blinks, blinks may appear very infrequently or not at all3. But, as more time goes on, Deep Fake creation will become objectively better, and these mistakes will become harder, or impossible, to find.
Thankfully, we may be able to fight fire with fire; we could use similar neural networks to help identify Deep Fakes from real videos. If we were to give a neural network a large group of fake and real videos, it might be able to parse out a real video from a fake, and with enough practice, it could be better than a human at finding fakes.
It may seem scary when you look only at the negatives, but Deep Fakes are a tremendously exciting technological advancement. In the near future, it may be possible that any of our content can be adapted to any language perfectly; Say “goodbye” to cheesy, poorly-dubbed karate movies, and “hello” to Deep Fakes!
 Suwajanakorn, Supasorn, et al. “Synthesizing Obama.” ACM Transactions on Graphics, vol. 36, no. 4, 2017, pp. 1–13.
 Viola, P., and M. Jones. “Robust Real-Time Face Detection.” Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
 Chang, Ming-Ching, et al. “In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking.” Cornell University Library, 2018
 Shen, Tianxeng, et al. “‘Deep Fakes’ Using Generative Adversarial Networks (GAN).” University of California: San Diego, 2018.