What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and methods for adapting neural network acoustic models

US10366687B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10366687-B2
Application number	US-201514965637-A
Country	US
Kind code	B2
Filing date	Dec 10, 2015
Priority date	Dec 10, 2015
Publication date	Jul 30, 2019
Grant date	Jul 30, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for adapting a trained neural network acoustic model, the method comprising: using at least one computer hardware processor to perform: generating speaker information values for a speaker; generating speech content values from speech data corresponding to an utterance spoken by the speaker; processing the speech content values and the speaker information values using the trained neural network acoustic model, the trained neural network acoustic model comprising a neural network and the processing comprising inputting the speaker information values to a partial layer of nodes of the neural network that is positioned in the neural network before a hidden layer of nodes of the neural network, the partial layer of nodes being configured to apply a transformation to the speaker information values based on parameters with which the partial layer of nodes are configured; and generating updated parameters for the partial layer of nodes based on the processing. 2. The method of claim 1 , further comprising using the at least one computer hardware processor to perform: generating second speech content values from second speech data corresponding to a second utterance spoken by the speaker; and processing the second speech content values and the speaker information values using the trained neural network acoustic model and the updated parameters for the partial layer of nodes. 3. The method of claim 1 , wherein generating the speaker information values comprises generating an i-vector for the speaker. 4. The method of claim 3 , wherein generating the i-vector for the speaker comprises generating a normalized and quantized i-vector for the speaker. 5. The method of claim 1 , wherein the neural network comprises at least three layers. 6. At least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one computer hardware processor, cause the at least one computer hardware processor to perform a method for adapting a trained neural network acoustic model, the method comprising: generating speaker information values for a speaker based, at least in part, on statistics from speech data; generating speech content values from speech data corresponding to an utterance spoken by the speaker; processing the speech content values and the speaker information values using the trained neural network acoustic model, the trained neural network acoustic model comprising a neural network and the processing comprising inputting the speaker information values to a partial layer of nodes of the neural network that is positioned in the neural network before a hidden layer of nodes of the neural network, the partial layer of nodes being configured to apply a transformation to the speaker information values based on parameters with which the partial layer of nodes are configured; and generating updated parameters for the partial layer of nodes based on the processing. 7. The at least one non-transitory computer-readable storage medium of claim 6 , where the method further comprises: generating second speech content values from second speech data corresponding to a second utterance spoken by the speaker; and processing the second speech content values and the speaker information values using the trained neural network acoustic model and the updated parameters for the partial layer of nodes. 8. The method of claim 6 , wherein generating the speaker information values comprises generating an i-vector for the speaker. 9. The method of claim 8 , wherein generating the i-vector for the speaker comprises generating a normalized and quantized i-vector for the speaker. 10. A system for adapting a trained neural network acoustic model, comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: generating speaker information values for a speaker based, at least in part, on statistics from speech data; generating speech content values from speech data corresponding to an utterance spoken by the speaker; processing the speech content values and the speaker information values using the trained neural network acoustic model, the trained neural network acoustic model comprising a neural network and the processing comprising inputting the speaker information values to a partial layer of nodes of the neural network that is positioned in the neural network before a hidden layer of nodes of the neural network, the partial layer of nodes being configured to apply a transformation to the speaker information values based on parameters with which the partial layer of nodes are configured; and generating updated parameters for the partial layer of nodes based on the processing. 11. The system of claim 10 , further comprising using the at least one computer hardware processor to perform: generating second speech content values from second speech data corresponding to a second utterance spoken by the speaker; and processing the second speech content values and the speaker information values using the trained neural network acoustic model and the updated parameters for the partial layer of nodes. 12. The system of claim 10 , wherein generating the speaker information values comprises generating an i-vector for the speaker. 13. The system of claim 12 , wherein generating the i-vector for the speaker comprises generating a normalized and quantized i-vector for the speaker. 14. The system of claim 10 , wherein the neural network comprises at least three layers.

Assignees

Nuance Communications Inc

Inventors

Classifications

G10L15/14
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title
G10L15/16Primary
using artificial neural networks · CPC title
G10L17/02
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
G10L15/075Primary
supervised, i.e. under machine guidance · CPC title
G10L15/07
to the speaker · CPC title

Patent family

Related publications grouped by family.

View patent family 57349170

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10366687B2 cover?: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information va…
Who is the assignee on this patent?: Nuance Communications Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model

Low-footprint adaptation and personalization for a deep neural network

Terminal and server of speaker-adaptation speech-recognition system and method for operating the system

Learning front-end speech recognition parameters within neural network training

Context-based speech recognition

Speech recognition using neural networks

Frequently asked questions