What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Fast deep neural network feature transformation via optimized memory bandwidth utilization

US10013652B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10013652-B2
Application number	US-201514699778-A
Country	US
Kind code	B2
Filing date	Apr 29, 2015
Priority date	Apr 29, 2015
Publication date	Jul 3, 2018
Grant date	Jul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Deep Neural Networks (DNNs) with many hidden layers and many units per layer are very flexible models with a very large number of parameters. As such, DNNs are challenging to optimize. To achieve real-time computation, embodiments disclosed herein enable fast DNN feature transformation via optimized memory bandwidth utilization. To optimize memory bandwidth utilization, a rate of accessing memory may be reduced based on a batch setting. A memory, corresponding to a selected given output neuron of a current layer of the DNN, may be updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, wherein a number of the selected few corresponds to the batch setting.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for improving computation time of speech recognition processing in an electronic device, the method comprising: by a processor: updating a current output value stored in a memory, the current output value corresponding to a selected given output neuron of a current layer of a Deep Neural Network (DNN), the current output value being updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, the selected few non-zero input neurons selected by skipping zero-skip neurons of the previous layer that have null input values for combining with respective weights for the updating, wherein a number of neurons for the selected few corresponds to a batch setting; iterating the updating for each output neuron of the current layer to update respective current output values, stored in the memory, with respective incremental output values computed for the respective output neurons; and repeating the updating and the iterating for each next selected few non-zero input neurons of the previous layer to reduce a rate of accessing the memory based on the batch setting to improve the computation time of the speech recognition processing. 2. The method of claim 1 , wherein the batch setting is a value of at least two neurons. 3. The method of claim 1 , further comprising: selecting the few non-zero input neurons from a plurality of input neurons of the previous layer of the DNN, wherein the few non-zero input neurons have non-zero input for the updating; selecting the given output neuron; fetching the weights between the selected few non-zero input neurons and the given output neuron; and computing the incremental output value. 4. The method of claim 3 , wherein the computing further includes employing Single Instruction Multiple Data (SIMD) instructions. 5. The method of claim 1 , further comprising selecting the few non-zero input neurons and terminating the repeating, iterating, and updating in an event each non-zero input neuron has been selected. 6. The method of claim 1 , further comprising: selecting the few non-zero input neurons; and in an event a remaining number of un-selected non-zero input neurons is fewer than the batch setting, the number of the selected few corresponds to the remaining number. 7. The method of claim 1 , wherein the method further comprises: receiving at least one speech signal over a speech interface; producing at least one feature vector from the at least one speech signal received; and applying the DNN to the at least one feature vector to compute at least one output feature vector for producing at least one speech recognition result. 8. The method of claim 1 , further comprising fetching the weights from a plurality of weight data structures stored in at least one memory of the speech recognition system and wherein a portion of the plurality of the weight data structures are stored in different memories of the at least one memory. 9. The method of claim 1 , further comprising: compressing a first portion of the weights; maintaining a second portion of the weights un-compressed, the second portion having weight values exceeding a range of the first portion, the second portion stored separately from the first portion; and in an event all output values of all output neurons of the current layer have been computed based on all non-zero input values of all non-zero input neurons of the previous layer in combination with all compressed weights, performing a subsequent pass to update each output value of each output neuron of the current layer based on input values of input neurons in combination with un-compressed weights. 10. The method of claim 9 , wherein the second portion is stored in a sparse matrix. 11. An apparatus for improving computation time of speech recognition processing in an electronic device, the apparatus comprising: a processor, the processor configured to: update a current output value stored in a memory, the current output value corresponding to a selected given output neuron of a current layer of a Deep Neural Network (DNN), the current output value being updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, the selected few non-zero input neurons selected by skipping zero-skip neurons of the previous layer that have null input values for combining with respective weights for the update operation, wherein a number of neurons for the selected few corresponds to a batch setting; iterate the update operation for each output neuron of the current layer to update respective current output values, stored in the memory, with respective incremental output values computed for the respective output neurons; and repeat the update and iterate operations for each next selected few non-zero input neurons of the previous layer to reduce a rate of accessing the memory based on the batch setting to improve the computation time of the speech recognition processing. 12. The apparatus of claim 11 , wherein the batch setting is a value of at least two neurons. 13. The apparatus of claim 11 , wherein the processor is further configured to: select the few non-zero input neurons from a plurality of input neurons of the previous layer of the DNN, wherein the few non-zero input neurons have non-zero input values for the update operation; select the given output neuron; fetch the weights between the selected few non-zero input neurons and the given output neuron; and compute the incremental output value. 14. The apparatus of claim 11 , wherein the processor is further configured to employ Single Instruction Multiple Data (SIMD) instructions to compute the incremental output value. 15. The apparatus of claim 11 , wherein the processor is further configured to select the few non-zero input neurons, terminate the repeat operation, terminate the iterate operation, and terminate the update operation in an event each non-zero input neuron has been selected. 16. The apparatus of claim 11 , wherein the processor is further configured to: select the few non-zero input neurons; and in an event a remaining number of un-selected non-zero input neurons is fewer than the batch setting, the number of the selected few corresponds to the remaining number. 17. The apparatus of claim 11 , wherein the apparatus further comprises: an audio interface configured to receive at least one speech signal over a speech interface; a speech recognition front-end configured to produce at least one feature vector from the at least one speech signal received; and wherein the processor is further configured to apply the DNN to the at least one feature vector to compute at least one output feature vector for producing at least one speech recognition result. 18. The apparatus of claim 11 , further wherein the processor is further configured to fetch the weights from a plurality of weight data structures stored in at least one memory of the speech recognition system and wherein a portion of the plurality of the weight data structures are stored in different memories of the at least one memory. 19. The apparatus of clai

Assignees

Nuance Communications Inc

Inventors

Classifications

G10L2015/0635
updating or merging of old and new templates; Mean values; Weighting · CPC title
G06N3/08Primary
Learning methods · CPC title
G10L15/16
using artificial neural networks · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 57205174

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013652B2 cover?: Deep Neural Networks (DNNs) with many hidden layers and many units per layer are very flexible models with a very large number of parameters. As such, DNNs are challenging to optimize. To achieve real-time computation, embodiments disclosed herein enable fast DNN feature transformation via optimized memory bandwidth utilization. To optimize memory bandwidth utilization, a rate of accessing memo…
Who is the assignee on this patent?: Nuance Communications Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Neural network training method and apparatus, and recognition method and apparatus

Sub-matrix input for neural network layers

Systems and methods for accelerating hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Frequently asked questions