Neural network training performance optimization framework
US-2017193361-A1 · Jul 6, 2017 · US
US11244225B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11244225-B2 |
| Application number | US-201615193741-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 27, 2016 |
| Priority date | Jul 10, 2015 |
| Publication date | Feb 8, 2022 |
| Grant date | Feb 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementing a neural network can include receiving a macro instruction for implementing the neural network within a control unit of a neural network processor. The macro instruction can indicate a first data set, a second data set, a macro operation for the neural network, and a mode of operation for performing the macro operation. The macro operation can be automatically initiated using a processing unit of the neural network processor by applying the second data set to the first data set based on the mode of operation.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving a macro instruction for implementing a neural network within a control unit of a neural network processor, wherein the macro instruction indicates a first data set, a second data set, a macro operation for the neural network, and a mode of operation for performing the macro operation, the second data set comprising a plurality of first elements, and each first element comprising a value; and automatically initiating the macro operation using a processing unit of the neural network processor by applying the second data set to the first data set based on the mode of operation and on an order of execution sequence of applying first elements of the second data set to the first data set so that a value of a first element that is a same value as another first element is applied a single time to the first data set for the macro operation, and further comprising receiving a second macro instruction that indicates a second macro operation of applying an activation function to accumulated data and a second mode of operation indicating that the activation function is selected from a plurality of activation functions. 2. The method of claim 1 , wherein the macro operation comprises convolution and the mode of operation is selected from the group consisting of a scatter mode of operation and a gather mode of operation. 3. The method of claim 1 , wherein the macro operation is selected from the group consisting of convolution and vector products. 4. The method of claim 3 , wherein the macro operation comprises convolution, the first data set is a selected region of a selected feature map, and the second data set is a plurality of weights of a selected kernel. 5. The method of claim 3 , wherein the macro operation comprises vector products, the first data set is a plurality of feature classification values, and the second data set is a plurality of weights for a feature classification layer of the neural network. 6. The method of claim 1 , wherein the processing unit performs the macro operation in log domain. 7. The method of claim 1 , wherein the mode of operation indicates a selected numerical precision from a plurality of different numerical precisions used by the processing unit in performing the macro operation. 8. The method of claim 7 , wherein the macro operation applies a selected portion of each item in the second data set to the first data set based on the selected numerical precision. 9. An apparatus, comprising: a control unit that receives a macro instruction; wherein the macro instruction indicates a first data set, a second data set, a macro operation for a neural network, and a mode of operation for performing the macro operation, the second data set comprising a plurality of first items, and each first item comprising a value; and a memory unit coupled to the control unit, the memory unit storing the first data set and the second data set; and a processing unit coupled to the memory unit, the processing unit automatically initiating the macro operation by applying the second data set to the first data set based on the mode of operation and based on an order of execution sequence that includes applying first elements of the second data set to the first data set so that a value of a first element that is a same value as another first element is applied a single time to the first data set for the macro operation, wherein the control unit receives a second macro instruction indicating a second macro operation of applying an activation function to accumulated data and a second mode of operation indicating that the activation function is selected from a plurality of activation functions, the processing unit further comprising: an activation function unit that applies the activation function to the accumulated data. 10. The apparatus of claim 9 , wherein the processing unit comprises: an arithmetic accumulate array that performs convolution, wherein the mode of operation is selected from the group consisting of a scatter mode of operation and a gather mode of operation. 11. The apparatus of claim 9 , wherein the processing unit comprises: an arithmetic accumulate array that performs the macro operation, wherein the macro operation is selected from the group consisting of convolution and vector products. 12. The apparatus of claim 9 , wherein the processing unit performs the macro operation in log domain. 13. The apparatus of claim 9 , wherein the mode of operation indicates a selected numerical precision from a plurality of different numerical precisions for the processing unit used in performing the macro operation. 14. The apparatus of claim 13 , wherein the macro operation applies a selected portion of each first item in the second data set to the first data set based on the selected numerical precision. 15. A computer-program product comprising a computer-readable storage medium having program code stored thereon, the program code executable by a processor to: receive a macro instruction for implementing a neural network, wherein the macro instruction indicates a first data set, a second data set, a macro operation for the neural network, and a mode of operation for performing the macro operation, the second data set comprising a plurality of first elements, and each first element comprising a value; and automatically initiate the macro operation by applying the second data set to the first data set based on the mode of operation and on an order of execution sequence that includes applying first elements of the second data set to the first data set so that a value of a first element that is a same value as another first element is applied a single time to the first data set for the macro operation, and further comprising receiving a second macro instruction that indicates a second macro operation of applying an activation function to accumulated data and a second mode of operation indicating that the activation function is selected from a plurality of activation functions. 16. The computer-program product of claim 15 , wherein the macro operation comprises convolution and the mode of operation is selected from the group consisting of a scatter mode of operation and a gather mode of operation.
Multidimensional correlation or convolution · CPC title
Activation functions · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.