What technology area does this patent fall under?

Primary CPC classification G10L17/02. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Oct 08 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for detecting spoofing conditions

US2020321009A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2020321009-A1
Application number	US-202016907951-A
Country	US
Kind code	A1
Filing date	Jun 22, 2020
Priority date	Mar 3, 2017
Publication date	Oct 8, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An automated speaker verification (ASV) system incorporates a first deep neural network to extract deep acoustic features, such as deep CQCC features, from a received voice sample. The deep acoustic features are processed by a second deep neural network that classifies the deep acoustic features according to a determined likelihood of including a spoofing condition. A binary classifier then classifies the voice sample as being genuine or spoofed.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented for detecting spoofed voice sources, the method comprising: extracting, by a computer, one or more acoustic features from a voice sample using a first deep neural network (DNN); and calculating, by the computer, via a second DNN a spoofing score indicating a likelihood that the voice sample includes a spoofing condition based in part on the acoustic features extracted from the voice sample. 2 . The method according to claim 1 , further comprising: classifying, by the computer executing a binary classifier, the voice sample as being either genuine or spoofed based on the spoofing score from the second DNN. 3 . The method according to claim 1 , wherein at least a portion of one or more of the acoustic features are deep constant Q cepstral coefficients (CQCC). 4 . The method according to claim 1 , wherein the spoofing conditions include at least one of channel conditions and audio conditions. 5 . The method according to claim 4 , wherein the channel conditions include channel artifacts associated with at least one of different background environments, different acquisition devices, and different network infrastructures. 6 . The method according to claim 1 , further comprising: extracting, by the computer, other acoustic features from the voice sample; combining, by the computer, the acoustic features with the other acoustic features to provide tandem features, and classifying, by the computer, the tandem features using the second DNN, the second DNN configured to determine whether the tandem features include a non-spoofing condition or at least one spoofing condition, wherein classifying the acoustic features is part of classifying the tandem features. 7 . The method according to claim 6 , wherein the other acoustic features are sub-band cepstral coefficient (SBCC) features, the method further comprising: sub-band filtering, by the computer, the voice sample before extracting the other features from the filtered sample, where said extracting the other, SBCC features includes: calculating, by the computer, a short-time Fourier transform (STFT) on a frame from the filtered sample, calculating, by the computer, a power spectrum from the STFT, calculating, by the computer, a log-amplitude from the power spectrum, calculating, by the computer, an inverse discrete cosine transform (IDCT) of the log-amplitude, and calculating, by the computer, dynamic features based on the IDCT. 8 . The method according to claim 7 , wherein filtering the audio sample includes using a high pass filter, thereby generating a filtered sample being limited to frequencies above a predetermined cutoff frequency. 9 . The method according to claim 1 , wherein the second DNN is configured to extract one or more multi-class features from the at least deep acoustic features. 10 . The method according to claim 1 , wherein the first DNN and the second DNN each include at least one of: an input layer, one or more hidden layers, one or more convolutional layers, a pooling layer, one or more fully-connected layers, and an output layer. 11 . The method according to claim 10 , wherein the pooling layer of the first DNN is configured to extract one or more bottleneck features from the acoustic features, and wherein the one or more bottleneck features are sensitive to the at least one audio artifact or channel artifact. 12 . The method according to claim 1 , further comprising: applying, by the computer, batch normalization for at least one of the first DNN and the second DNN, to one or more of: an input layer, one or more hidden layers, one or more fully-connected layers, and an output layer. 13 . The method according to claim 1 , wherein the second DNN is implemented using one or more graphics processors. 14 . The method according to claim 1 , wherein the configuration of the second DNN results from training the second DNN with a plurality of non-spoofed and known-spoofed voice samples. 15 . A system for detecting a spoofed voice source, the system comprising: a receiving circuit configured to receive a voice sample; and one or more processors configured to: extract one or more acoustic features from the voice sample using a first deep neural network (DNN); and calculate using a second DNN a spoofing score indicating a likelihood that the voice sample includes a spoofing condition based in part on the acoustic features extracted from the voice sample. 16 . The system according to claim 15 , wherein the one or more processors are further configured to classify using a binary classifier the voice sample as being either genuine spoofed based on the spoofing score. 17 . The system according to claim 15 , wherein at least a portion of the acoustic features are deep constant Q cepstral coefficients (CQCC). 18 . The system according to claim 15 , wherein the spoofing conditions include at least one of channel conditions and audio conditions. 19 . The system according to claim 18 , wherein the channel conditions include channel artifacts specific to at least one of: different background environments, different acquisition devices, and different network infrastructures. 20 . The system according to claim 15 , further comprising: circuitry configured to extract other acoustic features from the voice sample; wherein at least one of the one or more processors is further configured to combine using feature concatenation the one or more acoustic features with the other acoustic features to provide tandem features; and wherein the second DNN is further configured to: classify the tandem features; and determine whether the tandem features include a non-spoofing condition or at least one spoofing condition.

Assignees

Pindrop Security Inc

Inventors

Classifications

G10L25/51
for comparison or discrimination · CPC title
G10L25/30
using neural networks · CPC title
G10L19/02
using spectral analysis, e.g. transform vocoders or subband vocoders · CPC title
G10L17/06
Decision making techniques; Pattern matching strategies · CPC title
G10L17/04
Training, enrolment or model building · CPC title

Patent family

Related publications grouped by family.

View patent family 63355275

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020321009A1 cover?: An automated speaker verification (ASV) system incorporates a first deep neural network to extract deep acoustic features, such as deep CQCC features, from a received voice sample. The deep acoustic features are processed by a second deep neural network that classifies the deep acoustic features according to a determined likelihood of including a spoofing condition. A binary classifier then cla…
Who is the assignee on this patent?: Pindrop Security Inc
What technology area does this patent fall under?: Primary CPC classification G10L17/02. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Oct 08 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).