Acceleration of neural networks using depth-first processing

US11429855B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11429855-B2
Application numberUS-201815889275-A
CountryUS
Kind codeB2
Filing dateFeb 6, 2018
Priority dateFeb 6, 2018
Publication dateAug 30, 2022
Grant dateAug 30, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for accelerating a neural network, the method comprising: identifying neural network layers that meet a locality constraint; generating code for different hardware based on the identified neural network layers, wherein the generated code is used to implement depth first processing on the neural network, and wherein the different hardware comprises one or more central processing units (CPUs) or graphical processing units (GPUs); and performing the depth-first processing on the neural network based on the generated code, wherein identifying the neural network layers comprises: identifying parts of the neural network that perform a serial processing of functions that are mergeable to obtain a stack; and mapping each function in the stack to obtain at least one operation, wherein each operation has a loop type property designating whether the operation is based on a convolution layer, a pooling layer, or an element-wise operational layer, and wherein the loop type property designates whether the operation has a requirement for data evaluation. 2. The method according to claim 1 , wherein generating the code comprises: determining one or more pre-defined building blocks from the identified neural network layers; and combining the one or more pre-defined building blocks to obtain the code. 3. The method according to claim 1 , wherein the CPUs include fewer single instruction multiple data (SIMD) units compared to the GPUs. 4. The method according to claim 1 , wherein the at least one operation comprises an accumulation operation and a normalization operation, and wherein the accumulation operation has a loop type property requiring data to be processed in a certain area and the normalization operation has a loop type property with no data requirements. 5. The method according to claim 1 , wherein identifying the neural network layers further comprises: merging the at least one operation into one or more steps, wherein a step includes only one operation with a loop type property designating that the one operation has a requirement for data evaluation. 6. The method according to claim 5 , wherein identifying the neural network layers further comprises: grouping the one or more steps into one or more sequences, wherein a sequence includes steps with compatible loop types. 7. The method according to claim 6 , wherein sequences in the one or more sequences intended for CPUs have more steps than sequences in the one or more sequences intended for GPUs. 8. The method according to claim 7 , wherein a patch size is reduced based on available memory exceeding the memory threshold, the patch size being related to an amount of data input to the sequence. 9. The method according to claim 8 , wherein reduction of the patch size is limited by an underutilization of the different hardware. 10. The method according to claim 6 , wherein grouping the one or more steps into the one or more sequences includes determining how each step grouped in a sequence influences data requirements of the sequence so as to reduce an amount of available memory below a memory threshold. 11. The method according to claim 5 , wherein the step further includes a second operation with a loop type property designating that the second operation does not have a requirement for data evaluation. 12. The method according to claim 1 , wherein the stack comprises a first subset of neural network layers from the neural network layers, and wherein generating the code for the different hardware based on the identified neural network layers comprises: generating code to loop back and re-process the first subset of neural network layers after completing an iteration of processing the first subset of neural network layers. 13. The method according to claim 1 , wherein the stack comprises a first subset of neural network layers from the neural network layers, wherein the neural network layers further comprise at least one other neural network layer that is immediately subsequent to the first subset of neural network layers, and wherein generating the code for the different hardware based on the identified neural network layers comprises: generating code to: process the first subset of neural network layers; store an output from processing the first subset of neural network layers in main memory; and process the at least one other neural network layer based on retrieving the output from the main memory. 14. A system for accelerating a neural network, the system comprising one or more processors which, alone or in combination, are configured to provide for execution of the following steps: identifying neural network layers that meet a locality constraint; generating code for different hardware based on the identified neural network layers, wherein the generated code is used to implement depth first processing on the neural network, and wherein the different hardware comprises one or more central processing units (CPUs) or graphical processing units (GPUs); and performing the depth-first processing on the neural network based on the generated code, wherein identifying the neural network layers comprises: identifying parts of the neural network that perform a serial processing of functions that are mergeable to obtain a stack; and mapping each function in the stack to obtain at least one operation, wherein each operation has a loop type property designating whether the operation is based on a convolution layer, a pooling layer, or an element-wise operational layer, and wherein the loop type property designates whether the operation has a requirement for data evaluation.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

  • using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11429855B2 cover?
A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.
Who is the assignee on this patent?
NEC Laboratories Europe GmbH, Nec Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 30 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).