Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 01 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Efficiency adjustable speech recognition system

US11715462B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11715462-B2
Application number	US-202117244891-A
Country	US
Kind code	B2
Filing date	Apr 29, 2021
Priority date	Apr 29, 2021
Publication date	Aug 1, 2023
Grant date	Aug 1, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing system is configured to generate a transformer-transducer-based deep neural network. The transformer-transducer-based deep neural network comprises a transformer encoder network and a transducer predictor network. The transformer encoder network has a plurality of layers, each of which includes a multi-head attention network sublayer and a feed-forward network sublayer. The computing system trains an end-to-end (E2E) automatic speech recognition (ASR) model, using the transformer-transducer-based deep neural network. The E2E ASR model has one or more adjustable hyperparameters that are configured to dynamically adjust an efficiency or a performance of E2E ASR model when the E2E ASR model is deployed onto a device or executed by the device.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: one or more processors; and one or more computer-readable media having stored thereon computer-executable instructions that are structured such that, when executed by the one or more processors, cause the computing system to: generate a transformer-transducer-based deep neural network, the transformer-transducer-based deep neural network comprising a transformer encoder network and a transducer predictor network, and the transformer encoder network having a plurality of layers, each of the plurality of layers including a multi-head attention network sublayer and a feed-forward network sublayer; train an end-to-end (E2E) automatic speech recognition (ASR) model, using the transformer-transducer-based deep neural network, the E2E ASR model being trained to have one or more adjustable hyperparameters that are configured to dynamically adjust an efficiency or a performance of the E2E ASR model when the E2E ASR model is deployed onto a device or executed by the device; and provide the E2E ASR model to the device to be used to perform ASR in response to receiving a stream of speech. 2. The computing system of claim 1 , the transducer predictor network comprising one or more long-short-term memory (LSTV) networks. 3. The computing system of claim 1 , each of the multi-head attention network sublayer and the feed-forward network sublayer comprising a layer configured to perform a residual connection and a layer normalization. 4. The computing system of claim 1 , the one or more adjustable hyperparameters includes at least one of: (1) a number of layers that are to be implemented at the transformer encoder network, (2) a history window size indicating a number of history frames that are to be considered by a frame of each layer, (3) a look-ahead window size indicating a number of look ahead frames that are to be considered by a frame of each layer, (4) a chunk size indicating a total number of frames that are to be considered by a frame of each layer, (5) an attention mask indicating particular items in a frame index matrix that are to be set as “0”, the frame index representing a particular configuration of the transformer encoder network, or (6) a transducer path that is to be executed by the transducer predictor network. 5. The computing system of claim 4 , the E2E ASR model configured to perform at least the following at the device: identify one or more conditions of the device associated with computational power of the device, and set at least one of the one or more adjustable hyperparameters based on one or more conditions of the device. 6. The computing system of claim 4 , the E2E ASR model configured to perform at least the following at the device: enumerate a plurality of paths in the transducer predictor network; determine performance of each of the plurality of paths; select a particular path among the plurality of paths that has a best performance; and set the particular path as the transducer path. 7. The computing system of claim 5 , setting the one or more adjustable hyperparameters comprising: set a maximum chunk size, a maximum history window size, or a maximum look-ahead window size based on the one or more conditions of the device; and generate the attention mask based on the maximum chunk size, the maximum history window size, or the maximum look-ahead window size. 8. The computing system of claim 7 , wherein the one or more conditions of the device comprises at least one of following hardware conditions of the device (1) a type of processor that is installed on the device, (2) a number of processors that is installed on the device, (3) a type of memory that is installed on the device, or (4) a total amount of memory installed on the device. 9. The computing system of claim 8 , wherein the one or more conditions of the device comprises at least one of following runtime conditions (1) a function of a particular application of the E2E ASR model that is executed by the device, or (2) a current status of the device. 10. The computing system of claim 9 , the function of the particular application being at least one of (1) a streaming application configured to process a stream of speech in substantially real time, or (2) a post-processing application configured to process a file of a recorded speech. 11. The computing system of claim 10 , wherein the current status of the device comprises at least one of (1) a thermal status of the device, (2) a throttling status of the device, (3) a status of other applications that are currently executing at the device, (4) a battery level of the device, or (5) a battery-saving status of the device. 12. A method for dynamically adjusting an automatic speech recognition (ASR) system based on one or more conditions of a device, comprising: deploying an end-to-end (E2E) ASR model onto a device, the E2E ASR model being trained using a transformer-transducer-based deep neural network and having one or more adjustable hyperparameters; determining one or more conditions of the device associated with computational power of the device; setting at least one of the one or more adjustable hyperparameters based on the determined one or more conditions of the device associated with the computational power of the device; and performing ASR using the ASR model in response to receiving a stream of speech. 13. The method of claim 12 , wherein: the E2E ASR model comprises a transformer encoder network and a transducer predictor network; the transformer encoder network has a plurality of layers, each of which includes a multi-head attention network sublayer and a feed-forward network sublayer, and each of the multi-head attention network sublayer and the feed-forward network sublayer comprises a layer configured to perform a residual connection and a layer normalization. 14. The method of claim 13 , the transducer predictor network comprising one or more long-short-term memory (LSTM) networks. 15. The method of claim 13 , wherein the one or more adjustable hyperparameters comprising at least one of: (1) a number of layers that are to be implemented at the transformer encoder network, (2) a history window size indicating a number of history frames that are to be considered by a frame of each layer, (3) a look-ahead window size indicating a number of look ahead frames that are to be considered by a frame of each layer, (4) a chunk size indicating a total number of frames that are to be considered by a frame of each layer, (5) an attention mask indicating particular items in a frame index matrix that are to be set as “0”, the frame index representing a particular configuration of the transformer encoder network, or (6) a transducer path that is to be executed by the transducer predictor network. 16. The method of claim 15 , the setting of the transducer path comprising: enumerating a plurality of paths in the transducer predictor network; determining a performance of each of the plurality of paths; selecting a particular path among the plurality of paths that has a best performance; and setting the particular path as the transducer path. 17. The method of claim 12 , wherein one or more conditions of the device comprises at least one of following hardware conditions: (1) a type of processor that is installed on the device, (2) a number of processors that is installed on the device, (3) a type of memory that is installed on the device, and/or (4) a total amount of memory that is installed on the device. 18. The method of claim 12 , wher

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/0985
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G10L15/16Primary
using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 81388874

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11715462B2 cover?: A computing system is configured to generate a transformer-transducer-based deep neural network. The transformer-transducer-based deep neural network comprises a transformer encoder network and a transducer predictor network. The transformer encoder network has a plurality of layers, each of which includes a multi-head attention network sublayer and a feed-forward network sublayer. The computin…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 01 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Soft-forgetting for connectionist temporal classification based automatic speech recognition

Latency constraints for acoustic modeling

System and method for performing automatic speech recognition system parameter adjustment via machine learning

Speech recognition with sequence-to-sequence models

Dynamically selecting speech functionality on client devices

Frequently asked questions