End-to-end speaker recognition using deep neural network

US10381009B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10381009-B2
Application numberUS-201715818231-A
CountryUS
Kind codeB2
Filing dateNov 20, 2017
Priority dateSep 12, 2016
Publication dateAug 13, 2019
Grant dateAug 13, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

First claim

Opening claim text (preview).

The invention claimed is: 1. A speaker recognition device including a processor-based device having been configured to model a trained deep neural network with a triplet network architecture, the deep neural network having been trained according to a process in which dual sets of speech samples are fed through the deep neural network in combination with a cohort set of speech samples not attributed to the same speaker as the dual sets, comprising: a memory device storing speech samples including a set of speaker models; and the processor-based device feeding a recognition speech sample through the trained deep neural network, and verifying or identifying a user based on an output of the trained deep neural network responsive to the recognition speech sample and at least one of the speaker models. 2. The speaker recognition device of claim 1 , wherein the deep neural network includes, a first feed-forward neural network which receives and processes a first input to produce a first network output, a second feed-forward neural network which receives and processes a second input to produce a second network output, and a third feed-forward neural network which receives and processes a third input to produce a third network output. 3. The speaker recognition device of claim 2 , wherein each of the first, second, and third feed-forward neural networks includes at least one convolutional layer and a fully connected layer. 4. The speaker recognition device of claim 3 , wherein each of the first, second, and third feed-forward neural networks further includes at least one max-pooling layer and a subsequent fully connected layer. 5. The speaker recognition device of claim 3 , wherein each speech sample, which is inputted to a respective one of the first, second, and third feedforward neural networks, is preprocessed by: partitioning an underlying speech signal into a plurality of overlapping windows; and extracting a plurality of features from each of the overlapping windows. 6. The speaker recognition device of claim 5 , wherein each of the first, second, and third feed-forward neural networks includes a first convolutional layer to receive the preprocessed speech sample, the first convolutional layer comprises a number N C of convolutional filters, each of the N C convolutional filters has F×w f neurons, where F corresponds to the height of the first convolutional layer, and w f corresponds to the width of the convolutional layer, and F is equivalent to the number of the features extracted from each of the overlapping windows. 7. The speaker recognition device of claim 1 , wherein the device is configured to perform a speaker verification task in which the user inputs a self-identification, and the recognition speech sample is used to confirm that an identity of the user is the same as the self-identification. 8. The speaker recognition device of claim 1 , wherein the device is configured to perform a speaker identification task in which the recognition speech sample is used to identify the user from a plurality of potential identities stored in the memory device in association with respective speech samples. 9. The speaker recognition device of claim 1 , further comprising an input device which receives a speech sample from the user as the recognition speech sample. 10. A method of using a speaker recognition device including a processor-based device having been configured to model a trained deep neural network with a triplet network architecture, the deep neural network having been trained according to a process in which dual sets of speech samples are fed through the deep neural network in combination with a cohort set of speech samples not attributed to the same speaker as the dual sets, the method comprising: storing speech samples including a set of speaker models; and feeding a recognition speech sample through the trained deep neural network, and verifying or identifying a user based on an output of the trained deep neural network responsive to the recognition speech sample and at least one of the speaker models. 11. The method of claim 10 , further comprising preprocessing each speech sample by: partitioning an underlying speech signal into a plurality of overlapping windows; and extracting a plurality of features from each of the overlapping windows. 12. The method of claim 10 , further comprising: performing a speaker verification task in which the user inputs a self-identification, and the recognition speech sample is used to confirm that an identity of the user is the same as the self-identification. 13. The method of claim 10 , further comprising: performing a speaker identification task in which the recognition speech sample is used to identify the user from a plurality of stored potential identities in association with respective speech samples. 14. The method of claim 10 , further comprising: receiving a speech sample from the user as the recognition speech sample.

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • Artificial neural networks; Connectionist approaches · CPC title

  • Interactive procedures; Man-machine interfaces · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10381009B2 cover?
The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained a…
Who is the assignee on this patent?
Pindrop Security Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).