System and methods for adapting neural network acoustic models

US10366687B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10366687-B2
Application numberUS-201514965637-A
CountryUS
Kind codeB2
Filing dateDec 10, 2015
Priority dateDec 10, 2015
Publication dateJul 30, 2019
Grant dateJul 30, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for adapting a trained neural network acoustic model, the method comprising: using at least one computer hardware processor to perform: generating speaker information values for a speaker; generating speech content values from speech data corresponding to an utterance spoken by the speaker; processing the speech content values and the speaker information values using the trained neural network acoustic model, the trained neural network acoustic model comprising a neural network and the processing comprising inputting the speaker information values to a partial layer of nodes of the neural network that is positioned in the neural network before a hidden layer of nodes of the neural network, the partial layer of nodes being configured to apply a transformation to the speaker information values based on parameters with which the partial layer of nodes are configured; and generating updated parameters for the partial layer of nodes based on the processing. 2. The method of claim 1 , further comprising using the at least one computer hardware processor to perform: generating second speech content values from second speech data corresponding to a second utterance spoken by the speaker; and processing the second speech content values and the speaker information values using the trained neural network acoustic model and the updated parameters for the partial layer of nodes. 3. The method of claim 1 , wherein generating the speaker information values comprises generating an i-vector for the speaker. 4. The method of claim 3 , wherein generating the i-vector for the speaker comprises generating a normalized and quantized i-vector for the speaker. 5. The method of claim 1 , wherein the neural network comprises at least three layers. 6. At least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one computer hardware processor, cause the at least one computer hardware processor to perform a method for adapting a trained neural network acoustic model, the method comprising: generating speaker information values for a speaker based, at least in part, on statistics from speech data; generating speech content values from speech data corresponding to an utterance spoken by the speaker; processing the speech content values and the speaker information values using the trained neural network acoustic model, the trained neural network acoustic model comprising a neural network and the processing comprising inputting the speaker information values to a partial layer of nodes of the neural network that is positioned in the neural network before a hidden layer of nodes of the neural network, the partial layer of nodes being configured to apply a transformation to the speaker information values based on parameters with which the partial layer of nodes are configured; and generating updated parameters for the partial layer of nodes based on the processing. 7. The at least one non-transitory computer-readable storage medium of claim 6 , where the method further comprises: generating second speech content values from second speech data corresponding to a second utterance spoken by the speaker; and processing the second speech content values and the speaker information values using the trained neural network acoustic model and the updated parameters for the partial layer of nodes. 8. The method of claim 6 , wherein generating the speaker information values comprises generating an i-vector for the speaker. 9. The method of claim 8 , wherein generating the i-vector for the speaker comprises generating a normalized and quantized i-vector for the speaker. 10. A system for adapting a trained neural network acoustic model, comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: generating speaker information values for a speaker based, at least in part, on statistics from speech data; generating speech content values from speech data corresponding to an utterance spoken by the speaker; processing the speech content values and the speaker information values using the trained neural network acoustic model, the trained neural network acoustic model comprising a neural network and the processing comprising inputting the speaker information values to a partial layer of nodes of the neural network that is positioned in the neural network before a hidden layer of nodes of the neural network, the partial layer of nodes being configured to apply a transformation to the speaker information values based on parameters with which the partial layer of nodes are configured; and generating updated parameters for the partial layer of nodes based on the processing. 11. The system of claim 10 , further comprising using the at least one computer hardware processor to perform: generating second speech content values from second speech data corresponding to a second utterance spoken by the speaker; and processing the second speech content values and the speaker information values using the trained neural network acoustic model and the updated parameters for the partial layer of nodes. 12. The system of claim 10 , wherein generating the speaker information values comprises generating an i-vector for the speaker. 13. The system of claim 12 , wherein generating the i-vector for the speaker comprises generating a normalized and quantized i-vector for the speaker. 14. The system of claim 10 , wherein the neural network comprises at least three layers.

Assignees

Inventors

Classifications

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • G10L15/075Primary

    supervised, i.e. under machine guidance · CPC title

  • to the speaker · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10366687B2 cover?
Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information va…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).