Methods and apparatus to perform deepfake detection using audio and video features
US-2022269922-A1 · Aug 25, 2022 · US
US12164609B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12164609-B2 |
| Application number | US-202217724129-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 19, 2022 |
| Priority date | Apr 28, 2021 |
| Publication date | Dec 10, 2024 |
| Grant date | Dec 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus, method and computer program is disclosed. The apparatus may comprise means for receiving video data representing a video recording of at least one input made by a user at a user device; receiving audio data representing an audio recording of at least one audio input made by the user at the user device; determining whether there is a correspondence between the at least one input represented in the video data and the at least one audio input represented in the audio data; and providing verification based on the determination.
Opening claim text (preview).
The invention claimed is: 1. An apparatus ( 104 , 107 ) comprising: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, when executed by the at least one processor, cause the apparatus at least to: receive video data representing a video recording of at least one input made by a user physically interacting with a user device by a touch or gesture input; receive audio data representing an audio recording of at least one audio input made by the user at the user device; determine whether there is a correspondence between the at least one input represented in the video data and the at least one audio input represented in the audio data by determining whether the at least one input represented in the video data and the at least one audio input represented in the audio data were made at a same time based on timing data in the received video data and in the received audio data; and based on the determination, provide verification that the user is a human user, including outputting an indication to the user device that the verification was successful. 2. An apparatus ( 104 , 107 ) comprising: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, when executed by the at least one processor, cause the apparatus at least to: receive video data representing a video recording of at least one input made by a user at a user device; receive audio data representing an audio recording of at least one audio input made by the user at the user device; determine whether there is a correspondence between the at least one input represented in the video data and the at least one audio input represented in the audio data; based on the determination, provide verification; based on the received video data, determine a first set of at least one user-selectable regions of a user interface corresponding to the at least one input in the video recording; based on the received audio data, determine a second set of the at least one user-selectable regions corresponding to the at least one audio input in the audio recording; and based at least in part on whether the first set of at least one user-selectable regions and second set of at least one user-selectable regions at least partially match, determine whether there is said correspondence between the at least one input represented in the video data and the at least one audio input represented in the audio data. 3. The apparatus of claim 1 , wherein at least one of the at least one user-selectable regions of the user interface is configured to have a respective optical modification, and wherein the computer program code with the at least one processor are further configured to cause the apparatus to: based at least in part on a detection of the optical modification represented in the video data determine the first set of at least one user-selectable regions. 4. The apparatus of claim 3 , wherein said respective optical modification comprises at least one of color modification or a brightness modification. 5. The apparatus of claim 1 , wherein the computer program code with the at least one processor are further configured to cause the apparatus to: determine the second set of the at least one user-selectable regions based at least in part on spatial information in the audio data. 6. The apparatus of claim 1 , wherein a respective one of the at least one user-selectable regions of the user interface is configured to have a respective audio modification, and wherein the computer program code with the at least one processor are further configured to cause the apparatus to: based at least in part on a detection of at least one of the audio modifications represented in the audio data, determine the second set of at least one user-selectable regions. 7. The apparatus of claim 3 , wherein the computer program code with the at least one processor are further configured to cause the apparatus to: output a prompt ( 405 ) at the user device, and wherein the at least one input and the at least one audio input are received in response to the prompt, wherein the prompt causes an alphanumeric keypad and an instruction to enter a passcode to be displayed on the user interface. 8. The apparatus of claim 1 , wherein the computer program code with the at least one processor are further configured to cause the apparatus to: receive metadata corresponding to the user device, and wherein the determination whether there is a correspondence between the at least one input represented in the video data and the at least one audio input represented in the audio data is based, at least in part, on the received metadata. 9. The apparatus of claim 8 , wherein the metadata comprises metadata indicative of at least one of a type of the user device, a model of the user device or one or more dimensions of the user device. 10. The apparatus of claim 1 , wherein the received video data and the received audio data are comprised within a video file. 11. The apparatus of claim 1 , wherein the computer program code with the at least one processor are further configured to cause the apparatus to: receive an expected code; determine whether there is a correspondence between the expected code and at least one of the at least one input represented in the video data or the at least one audio input represented in the audio data; and based at least in part on the determination whether there is a correspondence between the expected code and the at least one of the at least one input represented in the video data or the at least one audio input represented in the audio data, provide said verification. 12. The apparatus of claim 1 , wherein the received audio data comprises at least one biometric audio distortion, wherein the computer program code with the at least one processor are further configured to cause the apparatus to: determine whether there is a correspondence between the at least one biometric audio distortion and a predetermined biometric user profile, and based at least in part on the determination whether there is a correspondence between the at least one biometric audio distortion and the predetermined biometric user profile, provide said verification. 13. The apparatus of claim 1 , wherein said verification comprises at least one of a verification of an identity of the user, a location of the user or that the user is a human user. 14. A method comprising: receiving video data representing a video recording of at least one input made by a user physically interacting with at a user device by a touch or gesture input; receiving audio data representing an audio recording of at least one audio input made by the user at the user device; determining whether there is a correspondence between the at least one input represented in the video data and the at least one audio input represented in the audio data by determining whether the at least one input represented in the video data and the at least one audio input represented in the audio data were made at a same time based on timing data in the received video data and in the received audio data; and based on the determination, providing verification that the user is a human user, including outputting an indication to the user device that the verification was successful. 15. The method of claim 14 , further comprising: based on the received video data, determining a first set of at least one user-selectable regions of a user interface corresponding to the at least o
for processing of video signals · CPC title
Verifying human interaction, e.g., Captcha · CPC title
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus · CPC title
using propagating acoustic waves · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.