Who is the assignee on this patent?

NEC Laboratories Europe GmbH, Nec Corp

What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 30 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Acceleration of neural networks using depth-first processing

US11429855B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11429855-B2
Application number	US-201815889275-A
Country	US
Kind code	B2
Filing date	Feb 6, 2018
Priority date	Feb 6, 2018
Publication date	Aug 30, 2022
Grant date	Aug 30, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for accelerating a neural network, the method comprising: identifying neural network layers that meet a locality constraint; generating code for different hardware based on the identified neural network layers, wherein the generated code is used to implement depth first processing on the neural network, and wherein the different hardware comprises one or more central processing units (CPUs) or graphical processing units (GPUs); and performing the depth-first processing on the neural network based on the generated code, wherein identifying the neural network layers comprises: identifying parts of the neural network that perform a serial processing of functions that are mergeable to obtain a stack; and mapping each function in the stack to obtain at least one operation, wherein each operation has a loop type property designating whether the operation is based on a convolution layer, a pooling layer, or an element-wise operational layer, and wherein the loop type property designates whether the operation has a requirement for data evaluation. 2. The method according to claim 1 , wherein generating the code comprises: determining one or more pre-defined building blocks from the identified neural network layers; and combining the one or more pre-defined building blocks to obtain the code. 3. The method according to claim 1 , wherein the CPUs include fewer single instruction multiple data (SIMD) units compared to the GPUs. 4. The method according to claim 1 , wherein the at least one operation comprises an accumulation operation and a normalization operation, and wherein the accumulation operation has a loop type property requiring data to be processed in a certain area and the normalization operation has a loop type property with no data requirements. 5. The method according to claim 1 , wherein identifying the neural network layers further comprises: merging the at least one operation into one or more steps, wherein a step includes only one operation with a loop type property designating that the one operation has a requirement for data evaluation. 6. The method according to claim 5 , wherein identifying the neural network layers further comprises: grouping the one or more steps into one or more sequences, wherein a sequence includes steps with compatible loop types. 7. The method according to claim 6 , wherein sequences in the one or more sequences intended for CPUs have more steps than sequences in the one or more sequences intended for GPUs. 8. The method according to claim 7 , wherein a patch size is reduced based on available memory exceeding the memory threshold, the patch size being related to an amount of data input to the sequence. 9. The method according to claim 8 , wherein reduction of the patch size is limited by an underutilization of the different hardware. 10. The method according to claim 6 , wherein grouping the one or more steps into the one or more sequences includes determining how each step grouped in a sequence influences data requirements of the sequence so as to reduce an amount of available memory below a memory threshold. 11. The method according to claim 5 , wherein the step further includes a second operation with a loop type property designating that the second operation does not have a requirement for data evaluation. 12. The method according to claim 1 , wherein the stack comprises a first subset of neural network layers from the neural network layers, and wherein generating the code for the different hardware based on the identified neural network layers comprises: generating code to loop back and re-process the first subset of neural network layers after completing an iteration of processing the first subset of neural network layers. 13. The method according to claim 1 , wherein the stack comprises a first subset of neural network layers from the neural network layers, wherein the neural network layers further comprise at least one other neural network layer that is immediately subsequent to the first subset of neural network layers, and wherein generating the code for the different hardware based on the identified neural network layers comprises: generating code to: process the first subset of neural network layers; store an output from processing the first subset of neural network layers in main memory; and process the at least one other neural network layer based on retrieving the output from the main memory. 14. A system for accelerating a neural network, the system comprising one or more processors which, alone or in combination, are configured to provide for execution of the following steps: identifying neural network layers that meet a locality constraint; generating code for different hardware based on the identified neural network layers, wherein the generated code is used to implement depth first processing on the neural network, and wherein the different hardware comprises one or more central processing units (CPUs) or graphical processing units (GPUs); and performing the depth-first processing on the neural network based on the generated code, wherein identifying the neural network layers comprises: identifying parts of the neural network that perform a serial processing of functions that are mergeable to obtain a stack; and mapping each function in the stack to obtain at least one operation, wherein each operation has a loop type property designating whether the operation is based on a convolution layer, a pooling layer, or an element-wise operational layer, and wherein the loop type property designates whether the operation has a requirement for data evaluation.

Assignees

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/04Primary
Architecture, e.g. interconnection topology · CPC title
G06N3/063
using electronic means · CPC title

Patent family

Related publications grouped by family.

View patent family 65009700

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11429855B2 cover?: A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.
Who is the assignee on this patent?: NEC Laboratories Europe GmbH, Nec Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 30 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Image processing apparatus, image processing method, and program

Acceleration techniques for graph analysis programs

System, Method, and Accelerator to Process Convolutional Neural Network Layers

Techniques for fast io and low memory consumption while using erasure codes

Scale-space label fusion using two-stage deep neural net

Spiking neural network with reduced memory access and reduced in-network bandwidth consumption

Frequently asked questions