Machine learning model with depth processing units

US12086704B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12086704-B2
Application numberUS-202117518535-A
CountryUS
Kind codeB2
Filing dateNov 3, 2021
Priority dateNov 30, 2018
Publication dateSep 10, 2024
Grant dateSep 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Representative embodiments disclose machine learning classifiers used in scenarios such as speech recognition, image captioning, machine translation, or other sequence-to-sequence embodiments. The machine learning classifiers have a plurality of time layers, each layer having a time processing block and a depth processing block. The time processing block is a recurrent neural network such as a Long Short Term Memory (LSTM) network. The depth processing blocks can be an LSTM network, a gated Deep Neural Network (DNN) or a maxout DNN. The depth processing blocks account for the hidden states of each time layer and uses summarized layer information for final input signal feature classification. An attention layer can also be used between the top depth processing block and the output layer.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for performing speech recognition, the method comprising: providing computer-readable speech data to a computer-implemented model that has been trained to recognize words in speech, wherein the computer-readable speech data encodes a spoken utterance that includes a word, and further wherein the computer-implemented model comprises: a first hidden layer, wherein the first hidden layer includes a first time processing block and a first layer processing block; a second hidden layer, wherein the second hidden layer includes a second time processing block and a second layer processing block, wherein the second time processing block is configured to receive output of the first time processing block, wherein the first layer processing block is configured to receive output of the first time processing block, and further wherein the second layer processing block is configured to receive output of the first layer processing block and output of the second time processing block; and an output layer that includes an output node that is configured to generate an output based upon output of the second layer processing block; and assigning a label to the computer-readable speech data based upon the output associated with the output node of the computer-implemented model, wherein the label is indicative of the word in the spoken utterance. 2. The computer-implemented method of claim 1 , wherein the first hidden layer additionally comprises a third time processing block and a third layer processing block, wherein the third time processing block is configured to receive output of the first time processing block, and further wherein the third layer processing block is configured to receive output of the third time processing block. 3. The computer-implemented method of claim 2 , wherein first time processing block, the first layer processing block, the second time processing block, and the second layer processing block are associated with a first time step, and further wherein the third time processing block and the third layer processing block are associated with a second time step that is subsequent the first time step. 4. The computer-implemented method of claim 3 , wherein the output layer includes a second output node that is configured to generate an output based upon output of the third layer processing block, wherein the label assigned to the computer-readable speech data is based upon the output generated by the second output node of the computer-implemented model. 5. The computer-implemented method of claim 1 , wherein the first layer processing block and the second layer processing block are recurrent neural networks. 6. The computer-implemented method of claim 5 , wherein the recurrent neural networks are long short-term memory (LSTM) networks. 7. The computer-implemented method of claim 1 , wherein the computer-implemented model additionally includes an attention layer that is positioned between the second hidden layer and the output layer. 8. The computer-implemented method of claim 1 , wherein the label identifies a senone from amongst numerous potential senones. 9. The computer-implemented method of claim 1 , wherein the first time processing block and the second time processing block are recurrent neural networks. 10. The computer-implemented method of claim 1 , wherein the output of the first layer processing block is not provided as input to the first time processing block. 11. The computer-implemented method of claim 1 , wherein the output is a senone, further comprising: sending the label to a word hypothesis decoder having a language model; and receiving from the decoder the word included in the spoken utterance, wherein the decoder identifies the word based upon the label. 12. A computing system comprising: a processor; and memory storing instructions that, when executed by the processor, cause the processor to perform acts comprising: providing computer-readable speech data to a computer-implemented model that has been trained to recognize words in speech, wherein the computer-readable speech data encodes a spoken utterance that includes a word, and further wherein the computer-implemented model comprises: a hidden layer that comprises a first layer processing block, a second layer processing block, a first time processing block, and a second time processing block, wherein the first layer processing block and the first time processing block correspond to a first time step, wherein the second layer processing block and the second time processing block correspond to a second time step that is subsequent the first time step, wherein output of the first time processing block is received as input to the first layer processing block and the second time processing block, and further wherein output of the second time processing block is received as input to the second layer processing block; and an output layer that includes a first output node and a second output node, wherein the first output node is configured to receive output of the first layer processing block and the second output node is configured to receive output of the second layer processing block, and further wherein the first output node is associated with a first output and the second output node is associated with a second output; assigning a label to the computer-readable speech data based upon the first output and the second output, wherein the label is indicative of the word in the spoken utterance. 13. The computing system of claim 12 , wherein the computer-implemented model further comprises a second hidden layer that is beneath the hidden layer in the computer-implemented model, wherein the hidden layer is configured to receive outputs of the second hidden layer. 14. The computing system of claim 13 , wherein the second hidden layer comprises a third layer processing block and a third time processing block, wherein the third layer processing block and the third time processing block are associated with the first time step, wherein the first layer processing block is configured to receive output of the third layer processing block, and further wherein the first time processing block is configured to receive output of the third time processing block. 15. The computing system of claim 14 , wherein the third layer processing block is configured to receive the output of the third time processing block, wherein the output of the third layer processing block is based upon the output of the third time processing block. 16. The computing system of claim 12 , wherein the first layer processing block and the second layer processing block are recurrent neural networks. 17. The computing system of claim 16 , wherein the recurrent neural networks are long short-term memory (LSTM) networks. 18. The computing system of claim 12 , wherein the computer-implemented model additionally includes an attention layer. 19. The computing system of claim 12 , wherein the label identifies a senone from amongst numerous potential senones. 20. A computer storage medium that stores instructions that, when executed by a processor, cause the processor to perform acts comprising: providing computer-readable speech data to a computer-implemented model that has been trained to recognize words in speech, wherein the computer-readable speech data encodes a spoken utterance that includes a word, and further wherein the computer-implemented model comprises: a first hidden layer, wherein the first hidden layer includes a first time

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Learning methods · CPC title

  • Supervised learning · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12086704B2 cover?
Representative embodiments disclose machine learning classifiers used in scenarios such as speech recognition, image captioning, machine translation, or other sequence-to-sequence embodiments. The machine learning classifiers have a plurality of time layers, each layer having a time processing block and a depth processing block. The time processing block is a recurrent neural network such as a …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/048. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).