What technology area does this patent fall under?

Primary CPC classification G10L17/18. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Neural networks for speaker verification

US9978374B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9978374-B2
Application number	US-201514846187-A
Country	US
Kind code	B2
Filing date	Sep 4, 2015
Priority date	Sep 4, 2015
Publication date	May 22, 2018
Grant date	May 22, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, at a computing device, data that characterizes an utterance of a user of the computing device; generating, at the computing device, a speaker representation for the utterance using a neural network on the computing device that was trained based on a plurality of training samples in a training procedure, wherein each training sample of the plurality of training samples includes (i) a first training input component that characterizes a first utterance, (ii) a second training input component that characterizes one or more second utterances, and (iii) a first classification for the training sample that indicates whether a speaker of the first utterance is the same or different from a speaker of the one or more second utterances, wherein the training procedure includes, for each training sample of the plurality of training samples: (i) generating a second classification for the training sample that indicates whether the speaker of the first utterance is the same or different from a speaker of the one or more second utterances, the second classification based on an output that results from processing the first training input component and the second training input component with the neural network, and (ii) adjusting parameters of the neural network based on comparison of the first classification for the training sample and the second classification for the training sample; accessing, at the computing device, a speaker model for an authorized user of the computing device; evaluating, at the computing device, the speaker representation for the utterance with respect to the speaker model to determine whether the utterance was likely spoken by the authorized user of the computing device; and performing, at the computing device, an operation that is selected based on whether the utterance is determined to have been likely spoken by the authorized user of the computing device. 2. The computer-implemented method of claim 1 , wherein each of the plurality of training samples was generated by selecting the first utterance and the one or more second utterances from groups of utterances that correspond to different speakers, such that each group of utterances consists only of utterances of the corresponding speaker for the respective group of utterances. 3. The computer-implemented method of claim 1 , further comprising: obtaining a set of utterances of the authorized user of the computing device; inputting each utterance from the set of utterances into the neural network to generate a respective speaker representation for the utterance; and generating the speaker model for the authorized user of the computing device based on an average of the respective speaker representations for the utterances in the set of utterances of the authorized user. 4. The computer-implemented method of claim 1 , wherein none of the plurality of training samples on which the neural network has been trained includes data that characterizes the utterance of the user of the computing device. 5. The computer-implemented method of claim 1 , wherein generating, at the computing device, the speaker representation for the utterance comprises processing data that characterizes an entirety of the utterance with the neural network in a single pass through the neural network. 6. The computer-implemented method of claim 1 , further comprising determining that the utterance was likely spoken by the authorized user of the computing device, wherein performing the operation that is selected based on whether the utterance is determined to have been likely spoken by the authorized user of the computing device comprises authenticating an identity of the user that submitted the utterance. 7. The computer-implemented method of claim 1 , further comprising determining that the utterance was likely spoken by the authorized user of the computing device, wherein performing the operation that is selected based on whether the utterance is determined to have been likely spoken by the authorized user of the computing device comprises transitioning the computing device from a locked state to an unlocked state. 8. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors of a computing device, cause performance of operations comprising: receiving, at the computing device, data that characterizes an utterance of a user of the computing device; generating, at the computing device, a speaker representation for the utterance using a neural network on the computing device that was trained based on a plurality of training samples in a training procedure, wherein each training sample of the plurality of training samples includes (i) a first training input component that characterizes a first utterance, (ii) a second training input component that characterizes one or more second utterances, and (iii) a first classification for the training sample that indicates whether a speaker of the first utterance is the same or different from a speaker of the one or more second utterances, wherein the training procedure includes, for each training sample of the plurality of training samples: (i) generating a second classification for the training sample that indicates whether the speaker of the first utterance is the same or different from a speaker of the one or more second utterances, the second classification based on an output that results from processing the first training input component and the second training input component with the neural network, and (ii) adjusting parameters of the neural network based on comparison of the first classification for the training sample and the second classification for the training sample; accessing, at the computing device, a speaker model for an authorized user of the computing device; evaluating, at the computing device, the speaker representation for the utterance with respect to the speaker model to determine whether the utterance was likely spoken by the authorized user of the computing device; and performing, at the computing device, an operation that is selected based on whether the utterance is determined to have been likely spoken by the authorized user of the computing device. 9. The non-transitory computer-readable media of claim 8 , wherein each of the plurality of training samples was generated by selecting the first utterance and the one or more second utterances from groups of utterances that correspond to different speakers, such that each group of utterances consists only of utterances of the corresponding speaker for the respective group of utterances. 10. The non-transitory computer-readable media of claim 8 , wherein the operations further comprise: obtaining a set of utterances of the authorized user of the computing device; inputting each utterance from the set of utterances into the neural network to generate a respective speaker representation for the utterance; and generating the speaker model for the authorized user of the computing device based on an average of the respective speaker representations for the utterances in the set of utterances of the authorized user. 11. The non-transitory computer-readable media of claim 8 , wherein none of the plurality of training samples on which the neural network has been trained includes data that characterizes the utterance of the user of the computing device. 12. The non-transitory computer-readable media of claim 8 , wherein generating, at the computing device, the speaker representation for the utterance comprises processing data that characterizes an entirety of the utteranc

Assignees

Google Llc

Inventors

Classifications

G10L17/18Primary
Artificial neural networks; Connectionist approaches · CPC title
G10L17/02Primary
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
G10L17/04
Training, enrolment or model building · CPC title
G07C9/37Primary
using biometric data, e.g. fingerprints, iris scans or voice recognition · CPC title

Patent family

Related publications grouped by family.

View patent family 56853791

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9978374B2 cover?: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can inclu…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L17/18. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).