Device and method to process data in parallel
US-2017011006-A1 · Jan 12, 2017 · US
US12118451B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12118451-B2 |
| Application number | US-201715423272-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 2, 2017 |
| Priority date | Jan 4, 2017 |
| Publication date | Oct 15, 2024 |
| Grant date | Oct 15, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are directed towards a system on chip (SoC) that implements a deep convolutional network heterogeneous architecture. The SoC includes a system bus, a plurality of addressable memory arrays coupled to the system bus, at least one applications processor core coupled to the system bus, and a configurable accelerator framework coupled to the system bus. The configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system. The SoC also includes a plurality of digital signal processors (DSPs) coupled to the system bus, wherein the plurality of DSPs coordinate functionality with the configurable accelerator framework to execute the DCNN.
Opening claim text (preview).
The invention claimed is: 1. A system on chip (SoC) that implements a deep convolutional neural network architecture, the SoC comprising: a system bus; a plurality of addressable memory arrays coupled to the system bus; at least one applications processor core coupled to the system bus; a configurable accelerator framework coupled to the system bus, wherein the configurable accelerator framework is an image and deep convolutional neural network (DCNN) co-processing system that includes a plurality of convolution accelerators configured to perform convolutional operations; and a plurality of digital signal processors (DSPs) coupled to the system bus, wherein the plurality of DSPs are configured to perform non-convolution data processing operations associated with processing of a DCNN kernel in parallel with execution of convolutional operations associated with processing of the DCNN kernel by the plurality of convolution accelerators of the configurable accelerator framework to execute the DCNN kernel, wherein the non-convolutional data processing operations associated with the DCNN kernel include one or more operations selected from a group of operations consisting of pooling, non-linear activation and cross-channel response normalization. 2. The SoC of claim 1 , wherein the plurality of digital signal processors include a plurality of DSP clusters coupled to the system bus, and wherein each DSP cluster of the plurality of DSP clusters includes at least two separate DSPs. 3. The SoC of claim 2 , further comprising: a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the system bus. 4. The SoC of claim 2 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch. 5. The SoC of claim 2 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP coupled to the first local DSP crossbar switch; a second DSP coupled to the second local DSP crossbar switch; a shared DSP cluster memory coupled to the DSP cluster crossbar switch; and a direct memory access coupled between the shared DSP cluster memory and the system bus. 6. The SoC of claim 1 , wherein the configurable accelerator framework includes: a reconfigurable stream switch coupled to the system bus; and wherein the plurality of convolution accelerators are coupled to the stream switch. 7. The SoC of claim 6 , wherein the stream switch is configurable at run-time and reconfigurable during execution of the at least one DCNN. 8. The SoC of claim 6 , wherein each of the plurality of convolution accelerators is configurable at run-time and reconfigurable during execution of the DCNN kernel. 9. The SoC of claim 1 , wherein non-convolution operations associated with the DCNN kernel are synchronized with convolutional operations associated with the DCNN kernel using interrupts. 10. The SoC of claim 1 , wherein non-convolution operations associated with the DCNN kernel are synchronized with convolutional operations associated with the DCNN kernel using mailboxes. 11. The SoC of claim 1 , wherein the memory arrays have a hierarchical structure. 12. The SoC of claim 6 wherein the configurable accelerator framework includes a plurality of control registers and the stream switch, in operation, provides the plurality of DSPs with access to the plurality of control registers during processing of the DCNN kernel to control performing of non-convolution operations by the plurality of DSPs. 13. A mobile computing device comprising: an imaging sensor that captures images; a system on chip (SoC) that implements a deep convolutional neural network architecture, the SoC, including: an SoC bus; an on-chip memory coupled to the SoC bus; a configurable accelerator framework coupled to the SoC bus, the configurable accelerator framework including a reconfigurable dataflow accelerator fabric that receives the captured images from the imaging sensor for deep convolutional neural network (DCNN) processing by at least one convolution accelerator; and a plurality of digital signal processor (DSPs) clusters coupled to the SoC bus, the plurality of DSP clusters configured to perform non-convolution operations for the DCNN and arranged to coordinate functionality with the at least one convolution accelerator of the configurable accelerator framework to execute the DCNN, wherein the plurality of DSP clusters are configured to perform the non-convolutional data processing operations for a DCNN kernel in parallel with execution of convolutional operations for the DCNN kernel by the at least one convolutional accelerator, wherein the non-convolutional data processing operations for the DCNN kernel include one or more operations selected from a group of operations consisting of pooling, non-linear activation and cross-channel response normalization. 14. The mobile computing device of claim 13 , wherein each DSP cluster of the plurality of DSP clusters includes at least two separate DSPs arranged for communication with each other and for communication with the SoC bus via a DSP cluster crossbar switch. 15. The mobile computing device of claim 13 , comprising: a global DSP cluster crossbar switch coupled to each of the plurality of DSP clusters and to the SoC bus. 16. The mobile computing device of claim 13 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP, a first instruction cache, and a first DSP memory, each coupled to the first local DSP crossbar switch; and a second DSP, a second instruction cache, and a second DSP memory, each coupled to the second local DSP crossbar switch. 17. The mobile computing device of claim 13 , wherein each DSP cluster includes: a DSP cluster crossbar switch; a first local DSP crossbar switch and a second local DSP crossbar switch, each coupled to the DSP cluster crossbar switch; a first DSP coupled to the first local DSP crossbar switch; a second DSP coupled to the second local DSP crossbar switch; a shared DSP cluster memory coupled to the DSP cluster crossbar switch; and a direct memory access coupled between the shared DSP cluster memory and the SoC bus. 18. The mobile computing device of claim 13 , wherein the reconfigurable dataflow accelerator fabric is configurable at run-time and reconfigurable during execution of the DCNN. 19. The mobile computing device of claim 13 , wherein the at least one convolution accelerator is configurable at run-time and reconfigurable during execution of the DCNN. 20. The mobile computing device of claim 13 wherein the configurable accelerator framework includes a plurality of control registers and a stream switch, which, in operation, provides the plurality of DSPs with access to the plurality of control registers during processing of the DCNN kernel to control performing of non-convolution operations by the plurality of DSPs. 21. A system on a chip (SoC), comprising:
Convolutional networks [CNN, ConvNet] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Specially adapted for signal processing, e.g. Harvard architectures · CPC title
using switching circuits, e.g. switching matrix, connection or expansion network (G06F13/4009 takes precedence) · CPC title
Configuring for program initiating, e.g. using registry, configuration files · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.