What technology area does this patent fall under?

Primary CPC classification G06N3/082. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Depth-first deep convolutional neural network inference

US12450486B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12450486-B2
Application number	US-202017121499-A
Country	US
Kind code	B2
Filing date	Dec 14, 2020
Priority date	Dec 13, 2019
Publication date	Oct 21, 2025
Grant date	Oct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method performed by a computing device includes determining a partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device. The computing device comprising a processor, on-chip memory, and off-chip memory. The first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of a write back to the off-chip memory. The method also includes processing, at the device via the multi-layer ANN, an input, using the depth-first processing in accordance with the partition.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a computing device comprising a processor, on-chip memory, and off-chip memory, the method comprising: determining a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations associated with the respective layer; and processing, at the computing device via the multi-layer ANN, an input, using the depth-first processing in accordance with the first partition, the depth-first processing comprising processing each tile associated with a respective portion of input activations through the set of consecutive layers of the first partition before processing a subsequent portion of input activations. 2. The method of claim 1 , in which the amount of on-chip memory used by the first partition is less than a total amount of on-chip memory. 3. The method of claim 1 , further comprising: recursively searching for new partition locations after determining the first partition; and pruning a potential partition location based on a size of a write back to the off-chip memory by the potential partition location. 4. The method of claim 3 , further comprising determining a second partition for the depth-first processing by the multi-layer ANN, in which layers of the second partition are different from layers of the first partition. 5. The method of claim 1 , further comprising generating a plurality of processing cones for the first partition, each processing cone processing a different portion of the input. 6. The method of claim 5 , in which processing the input comprises loading a portion of input activations of an initial layer of the first partition to the on-chip memory, the portion corresponding to a processing cone of the plurality of processing cones. 7. The method of claim 6 , in which processing the input further comprises: processing the portion of the input activations with activations of portions of subsequent layers of the first partition; storing partial results of the processing to the on-chip memory; and writing an output of the processing cone to the off-chip memory. 8. An apparatus, comprising: at least one processor comprising on-chip memory; off-chip memory coupled with the at least one processor; and instructions stored in the off-chip memory and the on-chip memory, the instructions operable, when executed by the at least one processor, to cause the apparatus: to determine a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the apparatus, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of an input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations activations associated with the respective layer; and to process, via the multi-layer ANN, an input, using the depth-first processing in accordance with the first partition, the depth-first processing comprising processing each tile associated with a respective portion of input activations through the set of consecutive layers of the first partition before processing a subsequent portion of input activations. 9. The apparatus of claim 8 , in which the amount of on-chip memory used by the first partition is less than a total amount of on-chip memory. 10. The apparatus of claim 8 , in which the instructions are further operable to cause the apparatus: to recursively search for new partition locations after determining the first partition; and to prune a potential partition location based on a size of a write back to the off-chip memory by the potential partition location. 11. The apparatus of claim 10 , in which the instructions are further operable to cause the apparatus to determine a second partition for the depth-first processing by the multi-layer ANN, in which layers of the second partition are different from layers of the first partition. 12. The apparatus of claim 8 , in which the instructions are further operable to cause the apparatus to generate a plurality of processing cones for the first partition, each processing cone processing a different portion of the input. 13. The apparatus of claim 12 , in which the instructions are further operable to cause the apparatus to process the input by loading a portion of input activations of an initial layer of the first partition to the on-chip memory, the portion corresponding to a processing cone of the plurality of processing cones. 14. The apparatus of claim 13 , in which the instructions are further operable to cause the apparatus to process the input by: processing the portion of the input activations with activations of portions of subsequent layers of the first partition; storing partial results of the processing to the on-chip memory; and writing an output of the processing cone to the off-chip memory. 15. A non-transitory computer-readable medium having program code recorded thereon for a computing device comprising at least one processor, on-chip memory, and off-chip memory, the program code executed by the at least one processor and comprising: program code to determine a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of an input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations a

Assignees

Qualcomm Inc

Inventors

Classifications

G06F2209/485
Resource constraint · CPC title
G06F9/5066
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
G06F9/4881
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

View patent family 76318185

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12450486B2 cover?: A method performed by a computing device includes determining a partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device. The computing device comprising a processor, on-chip memory, and off-chip memory. The first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and …
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Computational efficiency improvements for artificial neural networks

Acceleration of neural networks using depth-first processing

System, Method, and Accelerator to Process Convolutional Neural Network Layers

Architecture for sparse neural network acceleration

Low-power architecture for sparse neural network

Method of controlling storage device and random access memory and method of controlling nonvolatile memory device and buffer memory

Frequently asked questions