Convolution operation apparatus
US-2017116495-A1 · Apr 27, 2017 · US
US11562115B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11562115-B2 |
| Application number | US-201715423284-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 2, 2017 |
| Priority date | Jan 4, 2017 |
| Publication date | Jan 24, 2023 |
| Grant date | Jan 24, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are directed towards a configurable accelerator framework device that includes a stream switch and a plurality of convolution accelerators. The stream switch has a plurality of input ports and a plurality of output ports. Each of the input ports is configurable at run time to unidirectionally pass data to any one or more of the output ports via a stream link. Each one of the plurality of convolution accelerators is configurable at run time to unidirectionally receive input data via at least two of the plurality of stream switch output ports, and each one of the plurality of convolution accelerators is further configurable at run time to unidirectionally communicate output data via an input port of the stream switch.
Opening claim text (preview).
The invention claimed is: 1. A configurable accelerator framework device, comprising: a stream switch having a plurality of multibit unidirectional stream links, a plurality of multibit streaming data input ports, and a plurality of multibit streaming data output ports, each of the plurality of multibit unidirectional stream links including a switching mechanism configured to selectively pass data from any streaming data input port of the plurality of input ports to one or more of the plurality of streaming data output ports; and a plurality of convolution accelerators coupled together by the stream switch, each one of the plurality of convolution accelerators configurable at run time to unidirectionally receive input data via at least two of the plurality of streaming data output ports and to unidirectionally communicate output data via a streaming data input port of the stream switch, wherein the stream switch includes command logic, which, in operation, detects commands embedded in a data stream and controls the switching mechanisms of the multibit unidirectional stream links based on the detected commands. 2. The configurable accelerator framework device according to claim 1 , wherein each one of the plurality of convolution accelerators includes: a kernel buffer; a feature line buffer; and a multiply-accumulate (MAC) unit module having a plurality of MAC units arranged to multiply data passed from the kernel buffer with data passed from the feature line buffer, the plurality of MAC units further arranged to accumulate products of the multiplication. 3. The configurable accelerator framework device according to claim 2 , comprising: a first input bus coupling the kernel buffer to a first one of the at least two of the plurality of streaming data output ports, and a second input bus coupling the feature line buffer to a second one of the at least two of the plurality of streaming data output ports. 4. The configurable accelerator framework device according to claim 3 , wherein each one of the plurality of convolution accelerators includes: an adder tree module arranged to receive and sum data received from the MAC unit module. 5. The configurable accelerator framework device according to claim 4 , comprising a third input bus coupling the adder tree module to a third one of the at least two of the plurality of streaming data output ports, and wherein data passed into adder tree module via the third input bus is intermediate data produced by a second convolution accelerator of the plurality of convolution accelerators. 6. The configurable accelerator framework device according to claim 1 , comprising: a plurality of direct memory access (DMA) engines, each of the DMA engines configurable at run time to autonomously communicate data into the stream switch or out from the stream switch. 7. The configurable accelerator framework device according to claim 6 , wherein the configurable accelerator framework device is arranged as a coprocessor in a system on chip (SoC). 8. The configurable accelerator framework device according to claim 7 , comprising: a memory device integrated in the SoC and arranged to store kernel data and feature data, the kernel data and the feature data communicated between the memory and at least one of the plurality of convolution accelerators via selected ones of the plurality of DMA engines. 9. The configurable accelerator framework device according to claim 1 , comprising: control registers, first ones of the control registers arranged to control operations of the stream switch at run time, and second ones of the control registers arranged to control operations of the plurality of convolution accelerators at run time. 10. The configurable accelerator framework device according to claim 1 , comprising: a first plurality of IP's arranged to source streaming data into the stream switch; and a second plurality of IP's arranged to sink streaming data out from the stream switch. 11. A configurable accelerator framework method, comprising: configuring at run time a stream switch having a plurality of multibit streaming data input ports, a plurality of multibit streaming data output ports, and a plurality of multibit unidirectional stream links available to couple each of the plurality of streaming data input ports to any selected one or more of the plurality of streaming data output ports, each of the plurality of unidirectional stream links including a switching mechanism, wherein the stream switch selectively couples a plurality of convolutional accelerators together, the configuring at run time including: selecting a first streaming data input port of the stream switch from the plurality of streaming data input ports; selecting a first streaming data output port of the stream switch from the plurality of streaming data output ports; communicatively coupling the first streaming data input port of the stream switch to the selected first streaming data output port of the stream switch via the switching mechanism of a first unidirectional stream link of the plurality of unidirectional stream links of the stream switch; communicatively coupling a streaming data source to the first streaming data input port of the stream switch; and communicatively coupling a convolution accelerator of the plurality of convolutional accelerators to the first streaming data output port of the stream switch; unidirectionally passing streaming data through the first streaming data input port of the stream switch to the convolution accelerator in accordance with the configuring at run time; performing at least one convolution operation with the convolution accelerator; and unidirectionally passing output data of the at least one convolution operation from the convolution accelerator to a streaming data input port of the stream switch, wherein the stream switch includes command logic, which, in operation, detects commands embedded in a data stream and controls the switching mechanisms of the multibit unidirectional stream links based on the detected commands. 12. The configurable accelerator framework method according to claim 11 , the configuring the stream switch at runtime including: selecting second, third, and fourth streaming data input ports of the stream switch; selecting second, third, and fourth streaming data output ports of the stream switch; communicatively coupling, respectively, the second, third, and fourth streaming data input ports of the stream switch to the second, third, and fourth streaming data output ports of the stream switch via the respective switching mechanisms of second, third, and fourth unidirectional stream links of the plurality of unidirectional stream links of the stream switch; communicatively coupling a kernel data source to the second streaming data input port of the stream switch; communicatively coupling an intermediate data source to the third streaming data input port of the stream switch; and communicatively coupling an output of the convolution accelerator to the fourth streaming data input port of the stream switch; and unidirectionally passing convolution output data through the fourth streaming data input port of the stream switch. 13. The configurable accelerator framework method according to claim 12 , wherein the intermediate data source is an output of a second convolution accelerator. 14. The configurable accelerator framework method according to claim 12 , wherein the streaming data unidirectionally passed through the first streaming data input port of the stream switch is provided by an image sensor. 15. The configurable accelerator
Learning methods · CPC title
Physics · mapped topic
Physics · mapped topic
Machine learning · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.