Deep convolutional network heterogeneous architecture

US12118451B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12118451-B2
Application numberUS-201715423272-A
CountryUS
Kind codeB2
Filing dateFeb 2, 2017
Priority dateJan 4, 2017
Publication dateOct 15, 2024
Grant dateOct 15, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are directed towards a system on chip (SoC) that implements a deep convolutional network heterogeneous architecture. The SoC includes a system bus, a plurality of addressable memory arrays coupled to the system bus, at least one applications processor core coupled to the system bus, and a configurable accelerator framework coupled to the system bus. The configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system. The SoC also includes a plurality of digital signal processors (DSPs) coupled to the system bus, wherein the plurality of DSPs coordinate functionality with the configurable accelerator framework to execute the DCNN.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system on chip (SoC) that implements a deep convolutional neural network architecture, the SoC comprising: a system bus; a plurality of addressable memory arrays coupled to the system bus; at least one applications processor core coupled to the system bus; a configurable accelerator framework coupled to the system bus, wherein the configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system that includes a plurality of convolution accelerators configured to perform convolutional operations; and a plurality of digital signal processors (DSPs) coupled to the system bus, wherein the plurality of DSPs are configured to perform non-convolution data processing operations associated with processing of a DCNN kernel in parallel with execution of convolutional operations associated with processing of the DCNN kernel by the plurality of convolution accelerators of the configurable accelerator framework to execute the DCNN kernel, wherein the non-convolutional data processing operations associated with the DCNN kernel include one or more operations selected from a group of operations consisting of pooling, non-linear activation and cross-channel response normalization. 2. The SoC of claim 1 , wherein the plurality of digital signal processors include a plurality of DSP clusters coupled to the system bus, and wherein each DSP cluster of the plurality of DSP clusters includes at least two separate DSPs. 3. The SoC of claim 2 , further comprising: a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the system bus. 4. The SoC of claim 2 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch. 5. The SoC of claim 2 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP coupled to the first local DSP crossbar switch; a second DSP coupled to the second local DSP crossbar switch; a shared DSP cluster memory coupled to the DSP cluster crossbar switch; and a direct memory access coupled between the shared DSP cluster memory and the system bus. 6. The SoC of claim 1 , wherein the configurable accelerator framework includes: a reconfigurable stream switch coupled to the system bus; and wherein the plurality of convolution accelerators are coupled to the stream switch. 7. The SoC of claim 6 , wherein the stream switch is configurable at run-time and reconfigurable during execution of the at least one DCNN. 8. The SoC of claim 6 , wherein each of the plurality of convolution accelerators is configurable at run-time and reconfigurable during execution of the DCNN kernel. 9. The SoC of claim 1 , wherein non-convolution operations associated with the DCNN kernel are synchronized with convolutional operations associated with the DCNN kernel using interrupts. 10. The SoC of claim 1 , wherein non-convolution operations associated with the DCNN kernel are synchronized with convolutional operations associated with the DCNN kernel using mailboxes. 11. The SoC of claim 1 , wherein the memory arrays have a hierarchical structure. 12. The SoC of claim 6 wherein the configurable accelerator framework includes a plurality of control registers and the stream switch, in operation, provides the plurality of DSPs with access to the plurality of control registers during processing of the DCNN kernel to control performing of non-convolution operations by the plurality of DSPs. 13. A mobile computing device comprising: an imaging sensor that captures images; a system on chip (SoC) that implements a deep convolutional neural network architecture, the SoC, including: an SoC bus; an on-chip memory coupled to the SoC bus; a configurable accelerator framework coupled to the SoC bus, the configurable accelerator framework including a reconfigurable dataflow accelerator fabric that receives the captured images from the imaging sensor for deep convolutional neural network (DCNN) processing by at least one convolution accelerator; and a plurality of digital signal processor (DSPs) clusters coupled to the SoC bus, the plurality of DSP clusters configured to perform non-convolution operations for the DCNN and arranged to coordinate functionality with the at least one convolution accelerator of the configurable accelerator framework to execute the DCNN, wherein the plurality of DSP clusters are configured to perform the non-convolutional data processing operations for a DCNN kernel in parallel with execution of convolutional operations for the DCNN kernel by the at least one convolutional accelerator, wherein the non-convolutional data processing operations for the DCNN kernel include one or more operations selected from a group of operations consisting of pooling, non-linear activation and cross-channel response normalization. 14. The mobile computing device of claim 13 , wherein each DSP cluster of the plurality of DSP clusters includes at least two separate DSPs arranged for communication with each other and for communication with the SoC bus via a DSP cluster crossbar switch. 15. The mobile computing device of claim 13 , comprising: a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the SoC bus. 16. The mobile computing device of claim 13 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch. 17. The mobile computing device of claim 13 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP coupled to the first local DSP crossbar switch; a second DSP coupled to the second local DSP crossbar switch; a shared DSP cluster memory coupled to the DSP cluster crossbar switch; and a direct memory access coupled between the shared DSP cluster memory and the SoC bus. 18. The mobile computing device of claim 13 , wherein the reconfigurable dataflow accelerator fabric is configurable at run-time and reconfigurable during execution of the DCNN. 19. The mobile computing device of claim 13 , wherein the at least one convolution accelerator is configurable at run-time and reconfigurable during execution of the DCNN. 20. The mobile computing device of claim 13 wherein the configurable accelerator framework includes a plurality of control registers and a stream switch, which, in operation, provides the plurality of DSPs with access to the plurality of control registers during processing of the DCNN kernel to control performing of non-convolution operations by the plurality of DSPs. 21. A system on a chip (SoC), comprising:

Assignees

Inventors

Classifications

  • G06N3/0464Primary

    Convolutional networks [CNN, ConvNet] · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Specially adapted for signal processing, e.g. Harvard architectures · CPC title

  • using switching circuits, e.g. switching matrix, connection or expansion network (G06F13/4009 takes precedence) · CPC title

  • Configuring for program initiating, e.g. using registry, configuration files · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12118451B2 cover?
Embodiments are directed towards a system on chip (SoC) that implements a deep convolutional network heterogeneous architecture. The SoC includes a system bus, a plurality of addressable memory arrays coupled to the system bus, at least one applications processor core coupled to the system bus, and a configurable accelerator framework coupled to the system bus. The configurable accelerator fram…
Who is the assignee on this patent?
St Microelectronics Srl, St Microelectronics Int Nv, Stmicroelectronics Int B V
What technology area does this patent fall under?
Primary CPC classification G06N3/0464. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).