Methods and apparatus for training a transformation component

US2016019884A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016019884-A1
Application numberUS-201414335044-A
CountryUS
Kind codeA1
Filing dateJul 18, 2014
Priority dateJul 18, 2014
Publication dateJan 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data, the method comprising: using at least one computer processor to perform: coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters; and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component. 2 . The method of claim 1 , wherein the acoustic model comprises a multi-layer neural network and the transformation component comprises at least one network layer, and wherein coupling comprises coupling the at least one network layer to a first layer of the multi-layer neural network. 3 . The method of claim 1 , wherein the acoustic model comprises a deep neural network and the transformation component comprises a linear input network, and wherein coupling comprises coupling the linear input network to an input of the deep neural network. 4 . The method of claim 1 , wherein the first training data is characteristic of a first acoustic environment and the second training data is characteristic of a second acoustic environment. 5 . The method of claim 4 , wherein the second training data is characteristic of a particular second acoustic environment representing acoustic characteristics of at least one of a specific channel type, a specific language, a specific dialect, a specific speaker and a specific background noise environment. 6 . The method of claim 5 , wherein the first training data is comprised predominantly of speech data obtained via a first channel type and the second training data is comprised of speech data obtained via a second channel type. 7 . The method of claim 7 , wherein the first channel type is a near-field channel type and the second channel type is a far-field channel type. 8 . The method of claim 1 , wherein the second training data does not include stereo data corresponding to the first training data. 9 . At least one computer readable storage medium for storing instructions that, when executed by at least one hardware processor, perform a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data, the method comprising: coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters; and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component. 10 . The at least one computer readable medium of claim 9 , wherein the acoustic model comprises a multi-layer neural network and the transformation component comprises at least one network layer, and wherein coupling comprises coupling the at least one network layer to a first layer of the multi-layer neural network. 11 . The at least one computer readable medium of claim 9 , wherein the acoustic model comprises a deep neural network and the transformation component comprises a linear input network, and wherein coupling comprises coupling the linear input network to an input of the deep neural network. 12 . The at least one computer readable medium of claim 9 , wherein the second training data is characteristic of a particular second acoustic environment characterized by acoustic characteristics of at least one of a specific channel type, a specific language, a specific dialect, a specific speaker and a specific background noise environment. 13 . The at least one computer readable medium of claim 12 , wherein the second training data comprises speech data obtained from speakers speaking at a distance from a microphone such that the second training data is characteristic of far-field speech. 14 . The at least one computer readable medium of claim 9 , wherein the second training data does not include stereo data corresponding to the first training data. 15 . A system configured to train a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data, the system comprising: at least one hardware processor configured to: couple the transformation component to a portion of the acoustic model, the transformation component comprising second parameters; and train the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component. 16 . The system of claim 15 , wherein the acoustic model comprises a multi-layer neural network and the transformation component comprises at least one network layer, and wherein the at least one processor is configured to couple the at least one network layer to a first layer of the multi-layer neural network. 17 . The system of claim 15 , wherein the acoustic model comprises a deep neural network and the transformation component comprises a linear input network, and wherein the at least one processor is configured to augment the acoustic model at least in part by coupling the linear input network to an input of the deep neural network. 18 . The system of claim 15 , wherein the second training data is characteristic of a particular second acoustic environment characterized by acoustic characteristics of at least one of a specific channel type, a specific language, a specific dialect, a specific speaker and a specific background noise environment. 19 . The system of claim 18 , wherein the second training data comprises speech data obtained from speakers speaking at a distance from a microphone such that the second training data is characteristic of far-field speech. 20 . The system of claim 15 , wherein the second training data does not include stereo data corresponding to the first training data.

Assignees

Inventors

Classifications

  • Adaptation · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • using artificial neural networks · CPC title

  • G10L15/063Primary

    Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016019884A1 cover?
According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model,…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).