Deepfake detection
US-2024355334-A1 · Oct 24, 2024 · US
US2026088032A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2026088032-A1 |
| Application number | US-202519398581-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 24, 2025 |
| Priority date | Nov 24, 2025 |
| Publication date | Mar 26, 2026 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for detecting audio deepfakes, including synthetic speech generated using advanced artificial intelligence techniques. The disclosed techniques address the shortcomings of existing deepfake detection models, which often fail to robustly distinguish between authentic and synthetic audio and may require extensive retraining or large datasets. The deepfake detection system leverages verified audio samples of a known speaker to generate a distribution of detection scores using a speaker-independent deepfake detector, without modifying or retraining the underlying model. By segmenting the verified samples and constructing a statistical reference distribution, the system applies a statistical test to determine whether the detection scores from an unverified audio input are consistent with the reference distribution. This allows for accurate and efficient personalized deepfake detection, incorporating speaker-specific conditioning information without model retraining, and the resulting system is compatible with real-time applications.
Opening claim text (preview).
What is claimed is: 1 . An apparatus, comprising: a computer processor for executing computer program instructions; and a non-transitory computer-readable memory storing computer program instructions executable by the computer processor to perform operations comprising: receiving an unverified audio input signal of speech from a target speaker; receiving one or more verified audio input signals of speech from the target speaker; segmenting the one or more verified audio input signals into a plurality of windows; generating, at a neural network, a plurality of deepfake scores for the one or more verified audio input signals, including generating a respective deepfake score for each of the plurality of windows; determining a score distribution across the plurality of deepfake scores; generating, at the neural network, a test score for the unverified audio input signal; and determining, based on the test score and the score distribution, that the unverified audio input signal is one of authentic or fake. 2 . The apparatus of claim 1 , wherein the operations further comprise determining a statistical test value representing a difference between the test score and the score distribution. 3 . The apparatus of claim 2 , wherein determining that the unverified audio input signal is authentic includes determining that the statistical test value is less than a selected threshold. 4 . The apparatus of claim 2 , wherein determining that the unverified audio input signal is fake includes determining that the statistical test value is greater than a selected threshold. 5 . The apparatus of claim 1 , wherein determining the score distribution comprises fitting a parametric probability distribution to the plurality of deepfake scores. 6 . The apparatus of claim 1 , wherein the plurality of windows is a plurality of first windows, and wherein the operations further comprise augmenting the plurality of first windows to generate a plurality of augmented windows including variations of each of the plurality of first windows. 7 . The apparatus of claim 1 , wherein the neural network is a pretrained, speaker-independent deepfake detection model. 8 . The apparatus of claim 1 , wherein segmenting the one or more verified audio input signals into a plurality of windows includes sliding-window segmentation and the plurality of windows include a plurality of overlapping windows. 9 . One or more non-transitory computer-readable media storing instructions executable to perform operations, the operations comprising: receiving an unverified audio input signal of speech from a target speaker; receiving one or more verified audio input signals of speech from the target speaker; segmenting the one or more verified audio input signals into a plurality of windows; generating, at a neural network, a plurality of deepfake scores for the one or more verified audio input signals, including generating a respective deepfake score for each of the plurality of windows; determining a score distribution across the plurality of deepfake scores; generating, at the neural network, a test score for the unverified audio input signal; and determining, based on the test score and the score distribution, that the unverified audio input signal is one of authentic or fake. 10 . The one or more non-transitory computer-readable media of claim 9 , wherein the operations further comprise determining a statistical test value representing a difference between the test score and the score distribution. 11 . The one or more non-transitory computer-readable media of claim 10 , wherein determining that the unverified audio input signal is authentic includes determining that the statistical test value is less than a selected threshold. 12 . The one or more non-transitory computer-readable media of claim 10 , wherein determining that the unverified audio input signal is fake includes determining that the statistical test value is greater than a selected threshold. 13 . The one or more non-transitory computer-readable media of claim 9 , wherein determining the score distribution comprises fitting a parametric probability distribution to the plurality of deepfake scores. 14 . The one or more non-transitory computer-readable media of claim 9 , wherein the plurality of windows is a plurality of first windows, and wherein the operations further comprise augmenting the plurality of first windows to generate a plurality of augmented windows including variations of each of the plurality of first windows. 15 . The one or more non-transitory computer-readable media of claim 9 , wherein the neural network is a pretrained, speaker-independent deepfake detection model. 16 . The one or more non-transitory computer-readable media of claim 9 , wherein segmenting the one or more verified audio input signals into a plurality of windows includes sliding-window segmentation and the plurality of windows include a plurality of overlapping windows. 17 . A computer-implemented method for deepfake detection, comprising: receiving an unverified audio input signal of speech from a target speaker; receiving one or more verified audio input signals of speech from the target speaker; segmenting the one or more verified audio input signals into a plurality of windows; generating, at a neural network, a plurality of deepfake scores for the one or more verified audio input signals, including generating a respective deepfake score for each of the plurality of windows; determining a score distribution across the plurality of deepfake scores; generating, at the neural network, a test score for the unverified audio input signal; and determining, based on the test score and the score distribution, that the unverified audio input signal is one of authentic or fake. 18 . The computer-implemented method of claim 17 , further comprising determining a statistical test value representing a difference between the test score and the score distribution. 19 . The computer-implemented method of claim 18 , wherein determining that the unverified audio input signal is authentic includes determining that the statistical test value is less than a selected threshold. 20 . The computer-implemented method of claim 18 , wherein determining that the unverified audio input signal is fake includes determining that the statistical test value is greater than a selected threshold.
Artificial neural networks; Connectionist approaches · CPC title
characterised by the type of analysis window · CPC title
using biometric data, e.g. fingerprints, iris scans or voiceprints · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Decision making techniques; Pattern matching strategies · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.