Fast deep neural network feature transformation via optimized memory bandwidth utilization

US10013652B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10013652-B2
Application numberUS-201514699778-A
CountryUS
Kind codeB2
Filing dateApr 29, 2015
Priority dateApr 29, 2015
Publication dateJul 3, 2018
Grant dateJul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Deep Neural Networks (DNNs) with many hidden layers and many units per layer are very flexible models with a very large number of parameters. As such, DNNs are challenging to optimize. To achieve real-time computation, embodiments disclosed herein enable fast DNN feature transformation via optimized memory bandwidth utilization. To optimize memory bandwidth utilization, a rate of accessing memory may be reduced based on a batch setting. A memory, corresponding to a selected given output neuron of a current layer of the DNN, may be updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, wherein a number of the selected few corresponds to the batch setting.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for improving computation time of speech recognition processing in an electronic device, the method comprising: by a processor: updating a current output value stored in a memory, the current output value corresponding to a selected given output neuron of a current layer of a Deep Neural Network (DNN), the current output value being updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, the selected few non-zero input neurons selected by skipping zero-skip neurons of the previous layer that have null input values for combining with respective weights for the updating, wherein a number of neurons for the selected few corresponds to a batch setting; iterating the updating for each output neuron of the current layer to update respective current output values, stored in the memory, with respective incremental output values computed for the respective output neurons; and repeating the updating and the iterating for each next selected few non-zero input neurons of the previous layer to reduce a rate of accessing the memory based on the batch setting to improve the computation time of the speech recognition processing. 2. The method of claim 1 , wherein the batch setting is a value of at least two neurons. 3. The method of claim 1 , further comprising: selecting the few non-zero input neurons from a plurality of input neurons of the previous layer of the DNN, wherein the few non-zero input neurons have non-zero input for the updating; selecting the given output neuron; fetching the weights between the selected few non-zero input neurons and the given output neuron; and computing the incremental output value. 4. The method of claim 3 , wherein the computing further includes employing Single Instruction Multiple Data (SIMD) instructions. 5. The method of claim 1 , further comprising selecting the few non-zero input neurons and terminating the repeating, iterating, and updating in an event each non-zero input neuron has been selected. 6. The method of claim 1 , further comprising: selecting the few non-zero input neurons; and in an event a remaining number of un-selected non-zero input neurons is fewer than the batch setting, the number of the selected few corresponds to the remaining number. 7. The method of claim 1 , wherein the method further comprises: receiving at least one speech signal over a speech interface; producing at least one feature vector from the at least one speech signal received; and applying the DNN to the at least one feature vector to compute at least one output feature vector for producing at least one speech recognition result. 8. The method of claim 1 , further comprising fetching the weights from a plurality of weight data structures stored in at least one memory of the speech recognition system and wherein a portion of the plurality of the weight data structures are stored in different memories of the at least one memory. 9. The method of claim 1 , further comprising: compressing a first portion of the weights; maintaining a second portion of the weights un-compressed, the second portion having weight values exceeding a range of the first portion, the second portion stored separately from the first portion; and in an event all output values of all output neurons of the current layer have been computed based on all non-zero input values of all non-zero input neurons of the previous layer in combination with all compressed weights, performing a subsequent pass to update each output value of each output neuron of the current layer based on input values of input neurons in combination with un-compressed weights. 10. The method of claim 9 , wherein the second portion is stored in a sparse matrix. 11. An apparatus for improving computation time of speech recognition processing in an electronic device, the apparatus comprising: a processor, the processor configured to: update a current output value stored in a memory, the current output value corresponding to a selected given output neuron of a current layer of a Deep Neural Network (DNN), the current output value being updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, the selected few non-zero input neurons selected by skipping zero-skip neurons of the previous layer that have null input values for combining with respective weights for the update operation, wherein a number of neurons for the selected few corresponds to a batch setting; iterate the update operation for each output neuron of the current layer to update respective current output values, stored in the memory, with respective incremental output values computed for the respective output neurons; and repeat the update and iterate operations for each next selected few non-zero input neurons of the previous layer to reduce a rate of accessing the memory based on the batch setting to improve the computation time of the speech recognition processing. 12. The apparatus of claim 11 , wherein the batch setting is a value of at least two neurons. 13. The apparatus of claim 11 , wherein the processor is further configured to: select the few non-zero input neurons from a plurality of input neurons of the previous layer of the DNN, wherein the few non-zero input neurons have non-zero input values for the update operation; select the given output neuron; fetch the weights between the selected few non-zero input neurons and the given output neuron; and compute the incremental output value. 14. The apparatus of claim 11 , wherein the processor is further configured to employ Single Instruction Multiple Data (SIMD) instructions to compute the incremental output value. 15. The apparatus of claim 11 , wherein the processor is further configured to select the few non-zero input neurons, terminate the repeat operation, terminate the iterate operation, and terminate the update operation in an event each non-zero input neuron has been selected. 16. The apparatus of claim 11 , wherein the processor is further configured to: select the few non-zero input neurons; and in an event a remaining number of un-selected non-zero input neurons is fewer than the batch setting, the number of the selected few corresponds to the remaining number. 17. The apparatus of claim 11 , wherein the apparatus further comprises: an audio interface configured to receive at least one speech signal over a speech interface; a speech recognition front-end configured to produce at least one feature vector from the at least one speech signal received; and wherein the processor is further configured to apply the DNN to the at least one feature vector to compute at least one output feature vector for producing at least one speech recognition result. 18. The apparatus of claim 11 , further wherein the processor is further configured to fetch the weights from a plurality of weight data structures stored in at least one memory of the speech recognition system and wherein a portion of the plurality of the weight data structures are stored in different memories of the at least one memory. 19. The apparatus of clai

Assignees

Inventors

Classifications

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • using artificial neural networks · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013652B2 cover?
Deep Neural Networks (DNNs) with many hidden layers and many units per layer are very flexible models with a very large number of parameters. As such, DNNs are challenging to optimize. To achieve real-time computation, embodiments disclosed herein enable fast DNN feature transformation via optimized memory bandwidth utilization. To optimize memory bandwidth utilization, a rate of accessing memo…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).