Depth-first deep convolutional neural network inference

US12450486B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12450486-B2
Application numberUS-202017121499-A
CountryUS
Kind codeB2
Filing dateDec 14, 2020
Priority dateDec 13, 2019
Publication dateOct 21, 2025
Grant dateOct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method performed by a computing device includes determining a partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device. The computing device comprising a processor, on-chip memory, and off-chip memory. The first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of a write back to the off-chip memory. The method also includes processing, at the device via the multi-layer ANN, an input, using the depth-first processing in accordance with the partition.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a computing device comprising a processor, on-chip memory, and off-chip memory, the method comprising: determining a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations associated with the respective layer; and processing, at the computing device via the multi-layer ANN, an input, using the depth-first processing in accordance with the first partition, the depth-first processing comprising processing each tile associated with a respective portion of input activations through the set of consecutive layers of the first partition before processing a subsequent portion of input activations. 2. The method of claim 1 , in which the amount of on-chip memory used by the first partition is less than a total amount of on-chip memory. 3. The method of claim 1 , further comprising: recursively searching for new partition locations after determining the first partition; and pruning a potential partition location based on a size of a write back to the off-chip memory by the potential partition location. 4. The method of claim 3 , further comprising determining a second partition for the depth-first processing by the multi-layer ANN, in which layers of the second partition are different from layers of the first partition. 5. The method of claim 1 , further comprising generating a plurality of processing cones for the first partition, each processing cone processing a different portion of the input. 6. The method of claim 5 , in which processing the input comprises loading a portion of input activations of an initial layer of the first partition to the on-chip memory, the portion corresponding to a processing cone of the plurality of processing cones. 7. The method of claim 6 , in which processing the input further comprises: processing the portion of the input activations with activations of portions of subsequent layers of the first partition; storing partial results of the processing to the on-chip memory; and writing an output of the processing cone to the off-chip memory. 8. An apparatus, comprising: at least one processor comprising on-chip memory; off-chip memory coupled with the at least one processor; and instructions stored in the off-chip memory and the on-chip memory, the instructions operable, when executed by the at least one processor, to cause the apparatus: to determine a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the apparatus, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of an input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations activations associated with the respective layer; and to process, via the multi-layer ANN, an input, using the depth-first processing in accordance with the first partition, the depth-first processing comprising processing each tile associated with a respective portion of input activations through the set of consecutive layers of the first partition before processing a subsequent portion of input activations. 9. The apparatus of claim 8 , in which the amount of on-chip memory used by the first partition is less than a total amount of on-chip memory. 10. The apparatus of claim 8 , in which the instructions are further operable to cause the apparatus: to recursively search for new partition locations after determining the first partition; and to prune a potential partition location based on a size of a write back to the off-chip memory by the potential partition location. 11. The apparatus of claim 10 , in which the instructions are further operable to cause the apparatus to determine a second partition for the depth-first processing by the multi-layer ANN, in which layers of the second partition are different from layers of the first partition. 12. The apparatus of claim 8 , in which the instructions are further operable to cause the apparatus to generate a plurality of processing cones for the first partition, each processing cone processing a different portion of the input. 13. The apparatus of claim 12 , in which the instructions are further operable to cause the apparatus to process the input by loading a portion of input activations of an initial layer of the first partition to the on-chip memory, the portion corresponding to a processing cone of the plurality of processing cones. 14. The apparatus of claim 13 , in which the instructions are further operable to cause the apparatus to process the input by: processing the portion of the input activations with activations of portions of subsequent layers of the first partition; storing partial results of the processing to the on-chip memory; and writing an output of the processing cone to the off-chip memory. 15. A non-transitory computer-readable medium having program code recorded thereon for a computing device comprising at least one processor, on-chip memory, and off-chip memory, the program code executed by the at least one processor and comprising: program code to determine a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of an input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations a

Assignees

Inventors

Classifications

  • Resource constraint · CPC title

  • Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12450486B2 cover?
A method performed by a computing device includes determining a partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device. The computing device comprising a processor, on-chip memory, and off-chip memory. The first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and …
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).