What technology area does this patent fall under?

Primary CPC classification G10L17/18. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Speaker verification using neural networks

US9401148B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9401148-B2
Application number	US-201414228469-A
Country	US
Kind code	B2
Filing date	Mar 28, 2014
Priority date	Nov 4, 2013
Publication date	Jul 26, 2016
Grant date	Jul 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: inputting, by a computing device, speech data that corresponds to a particular utterance of a particular speaker to a neural network having parameters trained based on propagation between an input layer and an output layer through one or more hidden layers located between the input layer and the output layer, wherein the one or more hidden layers were trained using utterances of multiple speakers, and wherein the multiple speakers do not include the particular speaker; generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, a representation of activations occurring at a particular layer of the neural network that was trained as one of the hidden layers located between the input layer and the output layer; comparing, by the computing device, the generated representation of activations occurring at the particular layer of the neural network in response to the speech data that corresponds to the particular utterance with a reference representation of activations occurring at the particular layer of the neural network in response to speech data that corresponds to one or more past utterances of the particular speaker; based on comparing the generated representation and the reference representation, determining, by the computing device, that the particular utterance was likely spoken by the particular speaker; and providing, by the computing device, access to the computing device based on determining that the particular utterance was likely spoken by the particular speaker. 2. The method of claim 1 , wherein comparing, by the computing device, the generated representation with the reference representation comprises determining, by the computing device, a distance between the generated representation and the reference representation, and wherein determining, by the computing device, that the particular utterance was spoken by the particular speaker comprises determining, by the computing device, that the distance between the generated representation and the reference representation satisfies a threshold. 3. The method of claim 2 , wherein determining, by the computing device, a distance between the generated representation and the reference representation comprises computing, by the computing device, a cosine distance between the generated representation and the reference representation. 4. The method of claim 1 , wherein generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, the representation of activations occurring at the particular layer of the neural network that was trained as one of the hidden layers located between the input layer and the output layer comprises generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, a representation of activations occurring at a particular layer of the neural network that was trained as one of the hidden layers located adjacent to the output layer. 5. The method of claim 1 , wherein generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, the representation of activations occurring at the particular layer of the neural network that was trained as one of the hidden layers located between the input layer and the output layer comprises generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, the representation of activations occurring at a particular layer of the neural network that was trained as a predetermined one of the hidden layers located between the input layer and the output layer. 6. The method of claim 1 , comprising: obtaining, by the computing device access to the neural network; for each of multiple utterances of the particular speaker: inputting, by the computing device, speech data corresponding to the respective utterance to the neural network; and generating, by the computing device, a representation of activations occurring at the particular layer of the neural network in response to the speech data corresponding to the respective utterance; combining, by the computing device, the generated representations of activations occurring at the particular layer of the neural network in response to speech data corresponding to each of the multiple utterances of the particular speaker; and using, by the computing device, the combination of generated representations of activations occurring at the particular layer of the neural network in response to speech data corresponding to each of the multiple utterances of the particular speaker as the reference representation. 7. The method of claim 1 , further comprising dividing, by the computing device, the speech data corresponding to the particular utterance into frames; and wherein generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, the representation of activations occurring at the particular layer of the neural network comprises: determining, by the computing device and for each of multiple different frames of the speech data, a corresponding set of activations occurring at the particular layer of the neural network based on the frame; and generating, by the computing device, the representation of the activations occurring at the particular layer by averaging the sets of activations that respectively correspond to the multiple different frames. 8. The method of claim 1 , wherein generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, the representation of activations occurring at the particular layer of the neural network comprises: generating, by the computing device, the representation of activations occurring at the particular layer of the neural network (i) in response to inputting the speech data that corresponds to the particular utterance of the neural network, and (ii) irrespective of any activations occurring downstream from the particular layer in response to inputting the speech data that corresponds to the particular utterance of the neural network. 9. The method of claim 8 , wherein inputting, by the computing device, speech data that corresponds to the particular utterance to the neural network having parameters trained based on propagation between the input layer and the output layer through one or more hidden layers located between the input layer and the output layer comprises: inputting, by the computing device, speech data that corresponds to the particular utterance to a neural network whose layers have been trained based on activations occurring at the output layer. 10. The method of claim 1 , wherein the representation of the activations at the particular layer is a vector that indicates the activations at the particular layer. 11. The method of claim 1 , wherein the input layer, the output layer, and the one or more hidden layers are included in a trained neural network; wherein inputting the speech data comprises inputting the speech data to a neural network that includes a subset of the layers of the trained neural network and excludes the output layer of the trained neural network used during training of the trained neural network; and wherein generating the representation comprises generating the representation of activations of a particular layer of the neural network that

Assignees

Google Inc

Inventors

Classifications

G10L17/18Primary
Artificial neural networks; Connectionist approaches · CPC title

Patent family

Related publications grouped by family.

View patent family 53007663

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9401148B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; …
Who is the assignee on this patent?: Google Inc
What technology area does this patent fall under?: Primary CPC classification G10L17/18. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods and apparatus for reinforcement learning

Monaural speech filter

Sequence transcription with deep neural networks

Frequently asked questions