Deepfake detection
US-2024355334-A1 · Oct 24, 2024 · US
US2023015169A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023015169-A1 |
| Application number | US-202217933164-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 19, 2022 |
| Priority date | Oct 15, 2020 |
| Publication date | Jan 19, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising: receiving a first audio sample from a first speaker and a second audio sample from a second speaker; for each audio sample of the first audio sample and the second audio sample, generating a first respective sample variation by performing a first spectrogram augmentation technique on a frequency representation of the respective audio sample; generating a first score based on a comparison of the first respective sample variations; and generating, using a model, a prediction indicating whether the first speaker and the second speaker are the same speaker or different speakers based on the first score. 2 . The method of claim 1 , wherein the operations further comprise: for each audio sample of the first audio sample and the second audio sample, generating a second respective sample variation by performing a second spectrogram augmentation technique on the frequency representation of the respective audio sample. generating a second score based on a comparison of the second respective sample variations; and generating, using the model, a second prediction indicating whether the first speaker and the second speaker are the same speaker or different speakers based on the second score. 3 . The method of claim 2 , wherein the second prediction is based on the first score and the second score. 4 . The method of claim 2 , wherein the first spectrogram augmentation technique and the second spectrogram augmentation technique are different. 5 . The method of claim 2 , wherein the operations further comprise: for each audio sample of the first audio sample and the second audio sample, generating a third respective sample variation by performing a third spectrogram augmentation technique on the frequency representation of the respective audio sample; generating a third score based on a comparison of the third respective sample variations; and generating, using the model, a third prediction indicating whether the first speaker and the second speaker are the same speaker or different speakers based on the third score. 6 . The method of claim 5 , wherein the third prediction is based on the first score, the second score, and the third score. 7 . The method of claim 5 , wherein the first spectrogram augmentation technique is different than the second spectrogram augmentation technique and the third spectrogram augmentation technique is different than the first spectrogram augmentation technique and the second spectrogram augmentation technique. 8 . The method of claim 1 , wherein the first spectrogram augmentation technique comprises one of: a time masking technique; a frequency masking technique; or a time warping technique. 9 . The method of claim 1 , wherein the model comprises a Long Short-Term Memory (LSTM) neural network. 10 . The method of claim 1 , wherein the operations further comprise training the model by iteratively updating current values of one or more parameters of the model over a series of training cycles. 11 . A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving a first audio sample from a first speaker and a second audio sample from a second speaker; for each audio sample of the first audio sample and the second audio sample, generating a first respective sample variation by performing a first spectrogram augmentation technique on a frequency representation of the respective audio sample; generating a first score based on a comparison of the first respective sample variations; and generating, using a model, a prediction indicating whether the first speaker and the second speaker are the same speaker or different speakers based on the first score. 12 . The system of claim 11 , wherein the operations further comprise: for each audio sample of the first audio sample and the second audio sample, generating a second respective sample variation by performing a second spectrogram augmentation technique on the frequency representation of the respective audio sample. generating a second score based on a comparison of the second respective sample variations; and generating, using the model, a second prediction indicating whether the first speaker and the second speaker are the same speaker or different speakers based on the second score. 13 . The system of claim 12 , wherein the second prediction is based on the first score and the second score. 14 . The system of claim 12 , wherein the first spectrogram augmentation technique and the second spectrogram augmentation technique are different. 15 . The system of claim 12 , wherein the operations further comprise: for each audio sample of the first audio sample and the second audio sample, generating a third respective sample variation by performing a third spectrogram augmentation technique on the frequency representation of the respective audio sample; generating a third score based on a comparison of the third respective sample variations; and generating, using the model, a third prediction indicating whether the first speaker and the second speaker are the same speaker or different speakers based on the third score. 16 . The system of claim 15 , wherein the third prediction is based on the first score, the second score, and the third score. 17 . The system of claim 15 , wherein the first spectrogram augmentation technique is different than the second spectrogram augmentation technique and the third spectrogram augmentation technique is different than the first spectrogram augmentation technique and the second spectrogram augmentation technique. 18 . The system of claim 11 , wherein the first spectrogram augmentation technique comprises one of: a time masking technique; a frequency masking technique; or a time warping technique. 19 . The system of claim 11 , wherein the model comprises a Long Short-Term Memory (LSTM) neural network. 20 . The system of claim 11 , wherein the operations further comprise training the model by iteratively updating current values of one or more parameters of the model over a series of training cycles
Artificial neural networks; Connectionist approaches · CPC title
Use of distortion metrics or a particular distance between probe pattern and reference templates · CPC title
Decision making techniques; Pattern matching strategies · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.