Tombone's Computer Vision Blog: face detection

Showing posts with label face detection. Show all posts

Wednesday, May 16, 2018

DeepFakes: AI-powered deception machines

Driven by computer vision and deep learning techniques, a new wave of imaging attacks has recently emerged which allows anyone to easily create highly realistic "fake" videos. These false videos are known as Deep Fakes. While highly entertaining at times, DeepFakes can be used to perturb society and some would argue that the pre-shock has already begun. A rogue DeepFake which goes viral can spread misinformation across the internet like wildfire.

"The ability to effortlessly create visually plausible editing of faces in videos has the potential to severely undermine trust in any form of digital communication. "

--Rössler et al. FaceForensics [3]

Because DeepFakes contain a unique combination of realism and novelty, they are more difficult to detect on social networks as compared to traditional "bad" content like pornography and copyrighted movies. Video hashing might work for finding duplicates or copyright-infringing content, but not good enough for DeepFakes. To fight face-manipulating DeepFake AI, one needs an even stronger AI.

As today's DeepFakes are based on Deep Learning, and Deep Learning tools like TensorFlow and PyTorch are accessible to anybody with a modern GPU, such face manipulation tools are particularly disruptive. The democratization of Artificial Intelligence has brought us near infinite use-cases. From the DeepDream phenomenon of 2015 to the Deep Style Transfer Art apps of 2016, 2018 is the year of the DeepFake. Today's computer vision technology allows a hobbyist to create a Deep Fake video of just about any person they want performing any action they want, in a matter of hours, using commodity computer hardware.

Fig 1. DeepFakes generate "false impressions" which are attacks on the human mind.

What is a Deep Fake?
A deep fake is a video generated from a modern computer vision puppeteering face-swap algorithm which can be used to generate a video of target person X performing target action A, usually given a video of another person Y performing action A. The underlying system learns two face models, one of target person X, and of for person Y, the person in the original video. It then learns a mapping between the two faces, which can be used to create the resulting "fake" video. Techniques for facial reenactment have been pioneered by movie studios for driving character animations from real actors' faces, but these techniques are now emerging as deep learning-based software packages, letting the deep convolutional neural networks do most of the work during model training.

Consider the following collage of faces. Can you guess which ones are real and which ones are DeepFakes?

Fig 2. Can you tell which faces are real and which ones are fake?

Figure from Face Forensics[3]

It is not so easy to tell which image is modified and which one is unadulterated. And if you do a little bit of searching for DeepFakes (warning, unless you are careful, you will encounter lots of pornographic content) you notice that the faces in those videos look very realistic.

How are Deep Fakes made?
While there are conceptually many different ways to make Deep Fakes, today we'll focus on two key underlying techniques: face detection from videos, and deep learning for creating frame alignments between source face X and target face Y.

A lot of this research started with the Face2face work [1] presented at CVPR 2016. This paper was a modernization of the group's earlier SIGGRAPH paper and focused a lot more on the computer vision details. At this time the tools were good enough to create SIGGRAPH-quality videos, but it took a lot of work to put together a facial reenactment rig. In addition, the underlying algorithms did not use any deep learning, so a lot of domain-knowledge (i.e., face modeling expertise) went into making these algorithms work robustly. The TUM/Stanford guys filed their Real-time facial reenactment patent in 2016 [4], and have more recently worked on FaceForensics[3] to detect such manipulated imagery.

Fig3. Face2Face technique from 2016. It is 2018 now, so just imagine how much better this works now!

In addition to the Face2face guys (who have now a handful of similarly themed papers), it is interesting to note that a lot of key early ideas in face puppeteering were pioneered by Ira Kemelmacher-Shlizerman who is now a computer vision and graphics assistant professor at University of Washington. She worked on early face puppeteering technology for the 2010 paper Being John Malkovich, continued with the Photobios work, and later founded Dreambit (based on a SIGGRAPH 2016 paper), which was acquired by Facebook. :-)

Fig 4. Ira's early work on face swapping in 2010. See the Being John Malkovich paper[2].

Take a look at Ira's Dreambit video, which shows some high-quality "entertainment" value out of rapidly produced non-malicious DeepFakes!

Fig 5. Ira's Dreambit system. Lets her imagine herself in different eras, with different hairstyles, etc.

The origin of Ira's Dreambit system is the Transfiguring Portraits SIGGRAPH 2016 paper[6]. What's important to note is that this is 2016 and we're starting to see some use of Deep Learning. The transfiguring portraits work used a big mix of features, using some CNN features computed from early Caffe networks. It is not an entirely easy-to-use system at this point, but good enough to make SIGGRAPH videos, take a one minute to generate other cool outputs, and definitely cool enough for Facebook to acquire.

Fig 6. Transfiguring Portraits. The system used lots of features, but Deep Learning-based CNN features are starting to show up.

Fighting against DeepFakes
There are now published algorithms which try to battle DeepFakes by determining if faces/videos are fake or not. FaceForensics[3] introduces a large DeepFake dataset based on their earlier Face2face work. This dataset contains both real and "fake" Face2face output videos. More importantly, the new dataset is big enough to train a deep learning system to determine if an image is counterfeit. In addition, they are able to both 1.) determine which pixels have likely been manipulated, and 2.) perform a deep cleanup stage to make even better DeepFakes.

Fig 7. The "fakeness" masks in FaceForensics[3] are based on XceptionNet

Another fake detection approach, this time from a Berkeley AI Research group called Image Splice Detection, focuses on detecting where an image was spliced to create a fake composite image. This allows them to determine which part of the image was likely "photoshopped" and the technique is not specific to faces. And because this is a 2018 paper, it should not be a surprise that this kind of work is all based on deep learning techniques.

Fig 8. Fighting Fake News: Image Splice Detection[5]. Response maps are aggregated to determine the combined probability mask.[5]

From the Fighting Fake News paper,

"As new advances in computer vision and image-editing emerge, there is an increasingly urgent need for effective visual forensics methods. We see our approach, which successfully detects manipulations without seeing examples of manipulated images, as being an initial step toward building general-purpose forensics tools."

Concluding Remarks
The early DeepFake tools were pioneered in the early 2010s and were producing SIGGRAPH-quality results by 2015. It was only a matter of years until DeepFake generators became publicly available. 2018's DeepFake generators, being written on top of open-source Deep Learning libraries, are much easier to use than the researchy systems from only a few years back. Today, just about any hobbyist with minimal computer programming knowledge and a GPU can build their own DeepFakes.

Just as Deep Fakes are getting better, Generative Adversarial Networks are showing more promise for photorealistic image generation. It is likely that we will soon see lots of exciting new work on both the generative side (deep fake generation) and the discriminative side (deep fake detection and image forensics) which incorporate more and more ideas from the machine learning community.

References

[1] Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. "Face2face: Real-time face capture and reenactment of rgb videos." In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, pp. 2387-2395. IEEE, 2016.

[2] Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and Steven M. Seitz. "Being john malkovich." In European Conference on Computer Vision, pp. 341-353. Springer, 2010.

[3] Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. "FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces." arXiv preprint arXiv:1803.09179, 2018.

[4] Christian Theobalt, Michael Zollhöfer, Marc Stamminger, Justus Thies, Matthias Nießner. Real-time Expression Transfer for Facial Reenactment Invention. 2018/3/8. Application Number 15256710

[5] Minyoung Huh, Andrew Liu, Andrew Owens, Alexei A. Efros, "Fighting Fake News: Image Splice Detection via Learned Self-Consistency." arXiv preprint arXiv:1805.04096, 2018

[6] Ira Kemelmacher-Shlizerman, "Transfiguring portraits." ACM Transactions on Graphics (TOG), 35(4), p.94. 2016

Sunday, January 12, 2014

Can a person-specific face recognition algorithm be used to determine a person's race?

It's a valid question: can a person-specific face recognition algorithm be used to determine a person's race?

I trained two separate person-specific face detectors. For each detector I used videos of the target person's face to generate positive examples and faces from [google image search for "faces"] as negative examples. This is a fairly straightforward machine learning problem: find a decision boundary between the positive examples and the negative examples. I used the VMX Project recognition algorithm which learns from videos with minimal human supervision. In both cases, I used the VMX webapp for training (training each detector took about ~20 minutes from scratch). In fact, I didn't even have to touch the command line. Since videos were used an input, what I created are essentially full-blown sliding window detectors, meaning that they scan an entire image and can even find small faces. I then ran this detector on the large average male face image. This average face image has been around the internet for a while now and it was created by averaging people's faces. By running the algorithm on this one image, it analyzed all of the faces contained inside and I was able to see which country returned the highest scoring detection!

Experiment #1
For the first experiment, I used a video of my own face. Because I was using a live video stream, I was able to move my face around so that the algorithm saw lots of different viewing conditions. Here is a the output. Notice the green box around "Poland." Pretty good guess, especially since I moved from Poland to the US when I was 8.

Here is a 5 min video (VMX screencapture) of me running the "Tomasz" (that's my name in case you don't know) detector as I fly around the average male image. You can see the scores on lots of different races. High scoring detections are almost always on geographically relevant races.

Experiment #2
For the second target, I used a few videos of Andrew Ng to get positives. For those of you who don't know, Andrew Ng is a machine learning researcher, entrepreneur, professor at Stanford, and MOOC visionary. Here is the result. Notice the green box around "Japan." Very reasonable answer -- especially since I didn't give the algorithm an extra Asian faces for negatives.

Here is a 5 min video (VMX screencapture) of me running the "Andrew Ng" detector as I fly around the average male image.

In conclusion, person-specific face detectors from VMX can be used to help determine a person's race. At least the two VMX face detectors I trained behaved as expected. This is far from a full-out study, but I only had the chance to try out on two subjects and wanted to share what I found. The underlying algorithm inside VMX is a non-parametric exemplar-based model. During training the algorithm uses ideas from max-margin learning to create a separator between the positives and negatives.

If you've been following up on my computer vision research projects, you should have a good idea of how these things work. I want to mention that while I showcase VMX being used for face detection, there is nothing face-specific inside the algorithm. The same representation is used for bottles, cars, hands, mouths, etc. VMX is a general purpose object recognition ecosystem and we're excited to finally be releasing this technology to the world.

There are lots of cool applications of VMX detectors. What app will you build?

To learn more about VMX and get-in on the action, simply checkout the VMX Kickstarter project and back our campaign.

Monday, December 05, 2011

An accidental face detector

Disclaimer #1: I don't specialize in faces. When it comes to learning, I like my objectives to be convex. When it comes to hacking on vision systems, I like to tackle entry-level object categories.

Fun fact #1: Faces are probably the easiest objects in the world for a machine to localize/detect/recognize.

Note #1: I supplied the images, my algorithm supplied the red boxes.

Note #2: Sorry to all my friends who failed to get detected by my accidental face detector! (see below)

So I was hackplaying with some of my PhD thesis code over Thanksgiving, and I accidentally made a face detector. oops! I immediately ran to my screenshot capture tool and ran my code on my Mac desktop while browsing Google Images and Facebook. It seems to work pretty well on real faces as well as sketches/paintings of faces (see below)! I even caught two Berkeleyites (an Alyosha and a Jianbo), but you gotta find them for yourself. The detector is definitely tuned to frontal faces, but runs pretty fast and produces few false positives. Not too shabby for some midnight hackerdom.

Yes, I'm doing dense multiscale sliding windows here. Yes, I'm HoGGing the hell outta these images. Yes, I'm using a single frontal-face tuned template. And yes, I only used faces of myself to train this accidental face detector.

Note: If I've used one of your pictures without permission, and you would like a link back to your home on the interwebs, please leave a comment indicating the image and link to original.

Sunday, July 24, 2011

CMU Robotics Instititue's vision finds a home at Google

Congratulations to PittPatt for their recent acquisition by Google. PittPatt, a Pittsburgh-based startup, has its roots in CMU's Robotics Institute (where I'm currently a PhD student). Henry Schneiderman, the CEO of PittPatt, did some truly hardcore computer vision work while doing his PhD under Takeo Kanade.

Two famous papers by Hendry Schneiderman and Takeo Kanade are the following:
H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000) pdf format

H. Schneiderman, T. Kanade. "Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition." IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1998). pdf format

Here is what the front page of PittPatt states:

Joining Google is the next thrilling step in a journey that began with research at Carnegie Mellon University's Robotics Institute in the 1990s and continued with the launching of Pittsburgh Pattern Recognition (PittPatt) in 2004. We've worked hard to advance the research and technology in many important ways and have seen our technology come to life in some very interesting products. At Google, computer vision technology is already at the core of many existing products (such as Image Search, YouTube, Picasa, and Goggles), so it's a natural fit to join Google and bring the benefits of our research and technology to a wider audience. We will continue to tap the potential of computer vision in applications that range from simple photo organization to complex video and mobile applications.

We look forward to joining the team at Google!

The team at Pittsburgh Pattern Recognition

Perhaps Henry's success is yet another reason to come to CMU to get a vision PhD...