Multilingual deep neural network

US9842585B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9842585-B2
Application numberUS-201313792241-A
CountryUS
Kind codeB2
Filing dateMar 11, 2013
Priority dateMar 11, 2013
Publication dateDec 12, 2017
Grant dateDec 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: at a computing device that comprises at least one computer processor: receiving an acoustic signal at an automatic speech recognition (ASR) system, the ASR system configured to identify words in multiple different languages, the ASR system comprises a deep neural network (DNN), the DNN comprises an output layer that includes at least one softmax layer, the at least one softmax layer comprises output nodes, the output nodes correspond to the multiple different languages, wherein the DNN is trained based at least in part upon training data, the training data comprising spoken utterances in a source language, the acoustic signal comprising a spoken utterance that includes a word in a target language, the target language being different from the source language; extracting a plurality of features from the acoustic signal to form a feature vector; providing the feature vector to an input layer of the DNN, the DNN producing an output at the output layer responsive to being provided with the feature vector; identifying the word in the target language in the spoken utterance based upon the output of the DNN at the output layer; and performing at least one computing operation based upon the word in the target language in the spoken utterance being identified. 2. The method of claim 1 , wherein the training data comprise spoken utterances in the target language. 3. The method of claim 2 , wherein the spoken utterance comprises a second word in a second target language, and further comprising identifying the second word in the second target language in the spoken utterance based upon the output of the DNN at the output layer. 4. The method of claim 1 , wherein the DNN comprises: a plurality of hidden layers, wherein each hidden layer in the plurality of hidden layers comprises a respective plurality of nodes, each node configured to perform a linear or nonlinear transformation on its respective input, and wherein the at least one softmax layer comprises a first softmax layer that receives outputs of respective nodes in an uppermost layer of the plurality of hidden layers, the first softmax layer comprises a plurality of modeling units that are representative of respective senones used in the target language, wherein the first softmax layer is trained based solely upon training data in the target language. 5. The method of claim 4 , wherein the at least one softmax layer further comprises a second softmax layer that receives outputs of respective nodes in the uppermost layer of the plurality of hidden layers, the second softmax layer comprising a plurality of modeling units that are representative of senones used in speech in a second target language, wherein the second softmax layer is trained based solely upon training data in the second target language. 6. The method of claim 1 , wherein the DNN comprises: a plurality of hidden layers, wherein each hidden layer in the plurality of hidden layers comprises a respective plurality of nodes, each node configured to perform a linear or nonlinear transformation on its respective input, and wherein the at least one softmax layer is a single softmax layer, the single softmax layer receives outputs of respective nodes in the uppermost layer of the plurality of hidden layers, the single softmax layer comprising a plurality of modeling units that are representative of senones used in speech in the source language and the target language, the training data comprising spoken utterances in the target language, the method further comprising: at the computing device that comprises the at least one processor: identifying that the spoken utterance comprises the word in the target language; and selectively activating input synapses to the single softmax layer corresponding to senones used in the target language while failing to activate input synapses to the single softmax layer corresponding to senones not used in the target language. 7. The method of claim 1 executed in a mobile computing or a gaming device. 8. The method of claim 1 , wherein the DNN comprises a plurality of hidden layers and the at least one softmax layer comprises a plurality of softmax layers, and further wherein the DNN is trained in a parallel fashion using training data for different source languages, with values of parameters of the plurality of hidden layers and the plurality of softmax layers for each source language being adjusted simultaneously, and wherein the DNN is updated to comprise a new softmax layer, where the new softmax layer corresponds to a new target language and is trained by acoustic signals comprising spoken utterances in the new target language. 9. The method of claim 1 , wherein the DNN comprises a plurality of hidden layers and the at least one softmax layer includes a single softmax layer, wherein supervised learning is employed to train the DNN to learn values of parameters of the hidden layers and the single softmax layers based upon the training data. 10. The method of claim 1 , wherein the DNN is trained utilizing a plurality of sets of training data, each set of training data in the plurality of sets of training data corresponding to a different respective language. 11. A computing device comprising: at least one processor; and memory that comprises: a recognition system that is configured to detect words in multiple languages, the recognition system comprising: a deep neural network (DNN) that comprises: an input layer; a plurality of hidden layers, each hidden layer comprising a respective plurality of nodes, each node in a hidden layer being configured to perform a linear or nonlinear transformation on output of at least one node from an adjacent layer in the DNN, the plurality of hidden layers having parameters corresponding thereto, wherein values of the parameters are based upon training data that comprises acoustic signals that include spoken utterances in a plurality of different source languages; and at least one softmax layer that comprises modeling units that are representative of phonetic elements used in the multiple languages, the multiple languages include a target language, the at least one softmax layer having parameters corresponding thereto, wherein values of the parameters of the at least one softmax layer are based upon training data that comprises acoustic signals that include spoken utterances in the target language, the at least one softmax layer receiving outputs of nodes from an uppermost hidden layer in the DNN, wherein output of the at least one softmax layer is a probability distribution over the modeling units; and instructions that, when executed by the at least one processor, cause the at least one processor to perform acts comprising: receiving an acoustic signal that comprises a word in the target language; extracting features from the acoustic signal to generate a feature vector; providing the feature vector to the DNN; and identifying the word in the target language based upon the probability distribution over at least a subset of the modeling units. 12. The computing device of claim 11 being a mobile telephone or a gaming device. 13. The computing device of claim 12 being a server that is accessible by way of a telephone. 14. The computing device of claim 11 , wherein the at least one softmax layer comprises a plurality of softmax layers, each softmax layer in the plurality of softmax layers corresponding to a respective language in the multiple languages. 15. The computing device of claim 11 , wherein the modeling units represent senones, wherein the at least one softmax layer compris

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • Transfer learning · CPC title

  • Feedforward networks · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9842585B2 cover?
Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained fo…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).