Voice data transmission method and apparatus
US-2024363120-A1 · Oct 31, 2024 · US
US2020321009A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020321009-A1 |
| Application number | US-202016907951-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 22, 2020 |
| Priority date | Mar 3, 2017 |
| Publication date | Oct 8, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An automated speaker verification (ASV) system incorporates a first deep neural network to extract deep acoustic features, such as deep CQCC features, from a received voice sample. The deep acoustic features are processed by a second deep neural network that classifies the deep acoustic features according to a determined likelihood of including a spoofing condition. A binary classifier then classifies the voice sample as being genuine or spoofed.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented for detecting spoofed voice sources, the method comprising: extracting, by a computer, one or more acoustic features from a voice sample using a first deep neural network (DNN); and calculating, by the computer, via a second DNN a spoofing score indicating a likelihood that the voice sample includes a spoofing condition based in part on the acoustic features extracted from the voice sample. 2 . The method according to claim 1 , further comprising: classifying, by the computer executing a binary classifier, the voice sample as being either genuine or spoofed based on the spoofing score from the second DNN. 3 . The method according to claim 1 , wherein at least a portion of one or more of the acoustic features are deep constant Q cepstral coefficients (CQCC). 4 . The method according to claim 1 , wherein the spoofing conditions include at least one of channel conditions and audio conditions. 5 . The method according to claim 4 , wherein the channel conditions include channel artifacts associated with at least one of different background environments, different acquisition devices, and different network infrastructures. 6 . The method according to claim 1 , further comprising: extracting, by the computer, other acoustic features from the voice sample; combining, by the computer, the acoustic features with the other acoustic features to provide tandem features, and classifying, by the computer, the tandem features using the second DNN, the second DNN configured to determine whether the tandem features include a non-spoofing condition or at least one spoofing condition, wherein classifying the acoustic features is part of classifying the tandem features. 7 . The method according to claim 6 , wherein the other acoustic features are sub-band cepstral coefficient (SBCC) features, the method further comprising: sub-band filtering, by the computer, the voice sample before extracting the other features from the filtered sample, where said extracting the other, SBCC features includes: calculating, by the computer, a short-time Fourier transform (STFT) on a frame from the filtered sample, calculating, by the computer, a power spectrum from the STFT, calculating, by the computer, a log-amplitude from the power spectrum, calculating, by the computer, an inverse discrete cosine transform (IDCT) of the log-amplitude, and calculating, by the computer, dynamic features based on the IDCT. 8 . The method according to claim 7 , wherein filtering the audio sample includes using a high pass filter, thereby generating a filtered sample being limited to frequencies above a predetermined cutoff frequency. 9 . The method according to claim 1 , wherein the second DNN is configured to extract one or more multi-class features from the at least deep acoustic features. 10 . The method according to claim 1 , wherein the first DNN and the second DNN each include at least one of: an input layer, one or more hidden layers, one or more convolutional layers, a pooling layer, one or more fully-connected layers, and an output layer. 11 . The method according to claim 10 , wherein the pooling layer of the first DNN is configured to extract one or more bottleneck features from the acoustic features, and wherein the one or more bottleneck features are sensitive to the at least one audio artifact or channel artifact. 12 . The method according to claim 1 , further comprising: applying, by the computer, batch normalization for at least one of the first DNN and the second DNN, to one or more of: an input layer, one or more hidden layers, one or more fully-connected layers, and an output layer. 13 . The method according to claim 1 , wherein the second DNN is implemented using one or more graphics processors. 14 . The method according to claim 1 , wherein the configuration of the second DNN results from training the second DNN with a plurality of non-spoofed and known-spoofed voice samples. 15 . A system for detecting a spoofed voice source, the system comprising: a receiving circuit configured to receive a voice sample; and one or more processors configured to: extract one or more acoustic features from the voice sample using a first deep neural network (DNN); and calculate using a second DNN a spoofing score indicating a likelihood that the voice sample includes a spoofing condition based in part on the acoustic features extracted from the voice sample. 16 . The system according to claim 15 , wherein the one or more processors are further configured to classify using a binary classifier the voice sample as being either genuine spoofed based on the spoofing score. 17 . The system according to claim 15 , wherein at least a portion of the acoustic features are deep constant Q cepstral coefficients (CQCC). 18 . The system according to claim 15 , wherein the spoofing conditions include at least one of channel conditions and audio conditions. 19 . The system according to claim 18 , wherein the channel conditions include channel artifacts specific to at least one of: different background environments, different acquisition devices, and different network infrastructures. 20 . The system according to claim 15 , further comprising: circuitry configured to extract other acoustic features from the voice sample; wherein at least one of the one or more processors is further configured to combine using feature concatenation the one or more acoustic features with the other acoustic features to provide tandem features; and wherein the second DNN is further configured to: classify the tandem features; and determine whether the tandem features include a non-spoofing condition or at least one spoofing condition.
for comparison or discrimination · CPC title
using neural networks · CPC title
using spectral analysis, e.g. transform vocoders or subband vocoders · CPC title
Decision making techniques; Pattern matching strategies · CPC title
Training, enrolment or model building · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.