Image processing apparatus, image processing method, and program
US-2020258194-A1 · Aug 13, 2020 · US
US11429855B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11429855-B2 |
| Application number | US-201815889275-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 6, 2018 |
| Priority date | Feb 6, 2018 |
| Publication date | Aug 30, 2022 |
| Grant date | Aug 30, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.
Opening claim text (preview).
What is claimed is: 1. A method for accelerating a neural network, the method comprising: identifying neural network layers that meet a locality constraint; generating code for different hardware based on the identified neural network layers, wherein the generated code is used to implement depth first processing on the neural network, and wherein the different hardware comprises one or more central processing units (CPUs) or graphical processing units (GPUs); and performing the depth-first processing on the neural network based on the generated code, wherein identifying the neural network layers comprises: identifying parts of the neural network that perform a serial processing of functions that are mergeable to obtain a stack; and mapping each function in the stack to obtain at least one operation, wherein each operation has a loop type property designating whether the operation is based on a convolution layer, a pooling layer, or an element-wise operational layer, and wherein the loop type property designates whether the operation has a requirement for data evaluation. 2. The method according to claim 1 , wherein generating the code comprises: determining one or more pre-defined building blocks from the identified neural network layers; and combining the one or more pre-defined building blocks to obtain the code. 3. The method according to claim 1 , wherein the CPUs include fewer single instruction multiple data (SIMD) units compared to the GPUs. 4. The method according to claim 1 , wherein the at least one operation comprises an accumulation operation and a normalization operation, and wherein the accumulation operation has a loop type property requiring data to be processed in a certain area and the normalization operation has a loop type property with no data requirements. 5. The method according to claim 1 , wherein identifying the neural network layers further comprises: merging the at least one operation into one or more steps, wherein a step includes only one operation with a loop type property designating that the one operation has a requirement for data evaluation. 6. The method according to claim 5 , wherein identifying the neural network layers further comprises: grouping the one or more steps into one or more sequences, wherein a sequence includes steps with compatible loop types. 7. The method according to claim 6 , wherein sequences in the one or more sequences intended for CPUs have more steps than sequences in the one or more sequences intended for GPUs. 8. The method according to claim 7 , wherein a patch size is reduced based on available memory exceeding the memory threshold, the patch size being related to an amount of data input to the sequence. 9. The method according to claim 8 , wherein reduction of the patch size is limited by an underutilization of the different hardware. 10. The method according to claim 6 , wherein grouping the one or more steps into the one or more sequences includes determining how each step grouped in a sequence influences data requirements of the sequence so as to reduce an amount of available memory below a memory threshold. 11. The method according to claim 5 , wherein the step further includes a second operation with a loop type property designating that the second operation does not have a requirement for data evaluation. 12. The method according to claim 1 , wherein the stack comprises a first subset of neural network layers from the neural network layers, and wherein generating the code for the different hardware based on the identified neural network layers comprises: generating code to loop back and re-process the first subset of neural network layers after completing an iteration of processing the first subset of neural network layers. 13. The method according to claim 1 , wherein the stack comprises a first subset of neural network layers from the neural network layers, wherein the neural network layers further comprise at least one other neural network layer that is immediately subsequent to the first subset of neural network layers, and wherein generating the code for the different hardware based on the identified neural network layers comprises: generating code to: process the first subset of neural network layers; store an output from processing the first subset of neural network layers in main memory; and process the at least one other neural network layer based on retrieving the output from the main memory. 14. A system for accelerating a neural network, the system comprising one or more processors which, alone or in combination, are configured to provide for execution of the following steps: identifying neural network layers that meet a locality constraint; generating code for different hardware based on the identified neural network layers, wherein the generated code is used to implement depth first processing on the neural network, and wherein the different hardware comprises one or more central processing units (CPUs) or graphical processing units (GPUs); and performing the depth-first processing on the neural network based on the generated code, wherein identifying the neural network layers comprises: identifying parts of the neural network that perform a serial processing of functions that are mergeable to obtain a stack; and mapping each function in the stack to obtain at least one operation, wherein each operation has a loop type property designating whether the operation is based on a convolution layer, a pooling layer, or an element-wise operational layer, and wherein the loop type property designates whether the operation has a requirement for data evaluation.
Related publications grouped by family.
Answers are generated from the same data shown on this page.