Neural network training method and apparatus, and recognition method and apparatus
US-2016247064-A1 · Aug 25, 2016 · US
US10013652B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10013652-B2 |
| Application number | US-201514699778-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 29, 2015 |
| Priority date | Apr 29, 2015 |
| Publication date | Jul 3, 2018 |
| Grant date | Jul 3, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Deep Neural Networks (DNNs) with many hidden layers and many units per layer are very flexible models with a very large number of parameters. As such, DNNs are challenging to optimize. To achieve real-time computation, embodiments disclosed herein enable fast DNN feature transformation via optimized memory bandwidth utilization. To optimize memory bandwidth utilization, a rate of accessing memory may be reduced based on a batch setting. A memory, corresponding to a selected given output neuron of a current layer of the DNN, may be updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, wherein a number of the selected few corresponds to the batch setting.
Opening claim text (preview).
What is claimed is: 1. A method for improving computation time of speech recognition processing in an electronic device, the method comprising: by a processor: updating a current output value stored in a memory, the current output value corresponding to a selected given output neuron of a current layer of a Deep Neural Network (DNN), the current output value being updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, the selected few non-zero input neurons selected by skipping zero-skip neurons of the previous layer that have null input values for combining with respective weights for the updating, wherein a number of neurons for the selected few corresponds to a batch setting; iterating the updating for each output neuron of the current layer to update respective current output values, stored in the memory, with respective incremental output values computed for the respective output neurons; and repeating the updating and the iterating for each next selected few non-zero input neurons of the previous layer to reduce a rate of accessing the memory based on the batch setting to improve the computation time of the speech recognition processing. 2. The method of claim 1 , wherein the batch setting is a value of at least two neurons. 3. The method of claim 1 , further comprising: selecting the few non-zero input neurons from a plurality of input neurons of the previous layer of the DNN, wherein the few non-zero input neurons have non-zero input for the updating; selecting the given output neuron; fetching the weights between the selected few non-zero input neurons and the given output neuron; and computing the incremental output value. 4. The method of claim 3 , wherein the computing further includes employing Single Instruction Multiple Data (SIMD) instructions. 5. The method of claim 1 , further comprising selecting the few non-zero input neurons and terminating the repeating, iterating, and updating in an event each non-zero input neuron has been selected. 6. The method of claim 1 , further comprising: selecting the few non-zero input neurons; and in an event a remaining number of un-selected non-zero input neurons is fewer than the batch setting, the number of the selected few corresponds to the remaining number. 7. The method of claim 1 , wherein the method further comprises: receiving at least one speech signal over a speech interface; producing at least one feature vector from the at least one speech signal received; and applying the DNN to the at least one feature vector to compute at least one output feature vector for producing at least one speech recognition result. 8. The method of claim 1 , further comprising fetching the weights from a plurality of weight data structures stored in at least one memory of the speech recognition system and wherein a portion of the plurality of the weight data structures are stored in different memories of the at least one memory. 9. The method of claim 1 , further comprising: compressing a first portion of the weights; maintaining a second portion of the weights un-compressed, the second portion having weight values exceeding a range of the first portion, the second portion stored separately from the first portion; and in an event all output values of all output neurons of the current layer have been computed based on all non-zero input values of all non-zero input neurons of the previous layer in combination with all compressed weights, performing a subsequent pass to update each output value of each output neuron of the current layer based on input values of input neurons in combination with un-compressed weights. 10. The method of claim 9 , wherein the second portion is stored in a sparse matrix. 11. An apparatus for improving computation time of speech recognition processing in an electronic device, the apparatus comprising: a processor, the processor configured to: update a current output value stored in a memory, the current output value corresponding to a selected given output neuron of a current layer of a Deep Neural Network (DNN), the current output value being updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, the selected few non-zero input neurons selected by skipping zero-skip neurons of the previous layer that have null input values for combining with respective weights for the update operation, wherein a number of neurons for the selected few corresponds to a batch setting; iterate the update operation for each output neuron of the current layer to update respective current output values, stored in the memory, with respective incremental output values computed for the respective output neurons; and repeat the update and iterate operations for each next selected few non-zero input neurons of the previous layer to reduce a rate of accessing the memory based on the batch setting to improve the computation time of the speech recognition processing. 12. The apparatus of claim 11 , wherein the batch setting is a value of at least two neurons. 13. The apparatus of claim 11 , wherein the processor is further configured to: select the few non-zero input neurons from a plurality of input neurons of the previous layer of the DNN, wherein the few non-zero input neurons have non-zero input values for the update operation; select the given output neuron; fetch the weights between the selected few non-zero input neurons and the given output neuron; and compute the incremental output value. 14. The apparatus of claim 11 , wherein the processor is further configured to employ Single Instruction Multiple Data (SIMD) instructions to compute the incremental output value. 15. The apparatus of claim 11 , wherein the processor is further configured to select the few non-zero input neurons, terminate the repeat operation, terminate the iterate operation, and terminate the update operation in an event each non-zero input neuron has been selected. 16. The apparatus of claim 11 , wherein the processor is further configured to: select the few non-zero input neurons; and in an event a remaining number of un-selected non-zero input neurons is fewer than the batch setting, the number of the selected few corresponds to the remaining number. 17. The apparatus of claim 11 , wherein the apparatus further comprises: an audio interface configured to receive at least one speech signal over a speech interface; a speech recognition front-end configured to produce at least one feature vector from the at least one speech signal received; and wherein the processor is further configured to apply the DNN to the at least one feature vector to compute at least one output feature vector for producing at least one speech recognition result. 18. The apparatus of claim 11 , further wherein the processor is further configured to fetch the weights from a plurality of weight data structures stored in at least one memory of the speech recognition system and wherein a portion of the plurality of the weight data structures are stored in different memories of the at least one memory. 19. The apparatus of clai
updating or merging of old and new templates; Mean values; Weighting · CPC title
Learning methods · CPC title
using artificial neural networks · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.