Hardware accelerator engine

US12073308B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12073308-B2
Application numberUS-201715423279-A
CountryUS
Kind codeB2
Filing dateFeb 2, 2017
Priority dateJan 4, 2017
Publication dateAug 27, 2024
Grant dateAug 27, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are directed towards a hardware accelerator engine that supports efficient mapping of convolutional stages of deep neural network algorithms. The hardware accelerator engine includes a plurality of convolution accelerators, and each one of the plurality of convolution accelerators includes a kernel buffer, a feature line buffer, and a plurality of multiply-accumulate (MAC) units. The MAC units are arranged to multiply and accumulate data received from both the kernel buffer and the feature line buffer. The hardware accelerator engine also includes at least one input bus coupled to an output bus port of a stream switch, at least one output bus coupled to an input bus port of the stream switch, or at least one input bus and at least one output bus hard wired to respective output bus and input bus ports of the stream switch.

First claim

Opening claim text (preview).

The invention claimed is: 1. A hardware accelerator engine that supports efficient mapping of convolutional stages of deep neural network algorithms, the hardware accelerator engine comprising: a stream switch having first and second stream switch input ports and a plurality of stream switch output ports, the stream switch being configurable during run time to selectively connect each of the stream switch input ports to any one or more of the stream switch output ports; and a plurality of convolution accelerators coupled together via the stream switch, each one of the plurality of convolution accelerators including: a kernel buffer; a feature line buffer; and a plurality of multiply-accumulate (MAC) units arranged to multiply and accumulate data received from both the kernel buffer and the feature line buffer, wherein lines of the feature line buffer are configured to provide columns of feature line data to MAC units of the plurality of MAC units, wherein a first one of the convolution accelerators is operable, during run time and based on information processed from one or both of the kernel buffer and the feature line buffer during run time by the first convolution accelerator, to selectively reconfigure the feature line buffer of the first convolution accelerator from a first configuration in which a first line of the feature line buffer is coupled to a first MAC unit of the plurality of MAC units to a second configuration in which a second line of the feature line buffer is coupled to the first MAC unit of the plurality of MAC units. 2. The hardware accelerator engine according to claim 1 , wherein the kernel buffer is coupled via a first input bus to a first stream switch output port, and wherein the feature line buffer is coupled via a second input bus to a second stream switch output port. 3. The hardware accelerator engine according to claim 1 , wherein the feature line buffer stores up to 12 lines of an input feature frame with 16-bit wide pixel values. 4. The hardware accelerator engine according to claim 1 , wherein the feature line buffer is arranged to receive and store a plurality of lines of feature data comprised as at least one image frame, wherein each line of feature data has a first tag and a last tag, and the at least one image frame also has a line tag on its first line and a line tag on its last line. 5. The hardware accelerator engine according to claim 4 , comprising: validation logic to check and verify tag information included in the feature data. 6. The hardware accelerator engine according to claim 1 , wherein the feature line buffer is arranged in a dual ported memory device. 7. The hardware accelerator engine according to claim 1 , wherein the feature line buffer is arranged in a single port memory, and wherein data is written and read at alternate clock cycles. 8. The hardware accelerator engine according to claim 1 , wherein the kernel buffer is arranged to receive kernel data as a raw data stream having a first tag and a last tag. 9. The hardware accelerator engine according to claim 1 , each of the plurality of convolutional accelerators comprising: an adder tree; and a multiply-accumulate (MAC) module having a plurality of MAC units, the MAC module having first inputs coupled to the kernel buffer and second inputs coupled to the feature line buffer, wherein the plurality of MAC units are each arranged to multiply data from the kernel buffer with data from the feature line buffer to produce products, the MAC module further arranged to accumulate the products and pass accumulated product data to the adder tree. 10. The hardware accelerator engine according to claim 9 , comprising: an output buffer to receive summation data from the adder tree, wherein the output buffer is arranged to pass the summation data via at least one output bus to a selected input bus port of the stream switch. 11. A hardware accelerator engine method to implementing a portion of a deep convolutional neural network (DCNN), the method comprising: performing a batch calculation, the batch calculation including: receiving a stream of feature data via a first output port of a stream switch into a feature data buffer, the stream switch having a plurality of input ports and a plurality of output ports and being configurable at run time to selectively connect each of the input ports to any one of the output ports, wherein lines of the feature data buffer provide columns of feature data to one or more multiply-accumulate (MAC) units of a plurality of MAC units; receiving a stream of kernel data via a second output port of the plurality of stream switch output ports into a kernel data buffer; receiving a stream of intermediate data via a third output port of the stream switch, the stream of intermediate data being the results of a previous batch calculation into an intermediate data buffer; selectively reconfiguring the feature data buffer, during run time and based on information processed from one or more of the kernel data buffer and the feature data buffer as part of the batch calculation, from a first configuration in which a first line of the feature data buffer is coupled to a first MAC unit of the plurality of MAC units to a second configuration in which a second line of the feature line buffer is coupled to the first MAC unit of the plurality of MAC units; performing, in the plurality of MAC units, a plurality of concurrent convolution operations using at least some of the received feature data and at least some of the received kernel data; and passing a stream of batch calculation result data via a first input port of the stream switch. 12. The hardware accelerator engine method according to claim 11 , comprising: performing a plurality of concurrent batch calculations, wherein at least one of the plurality of concurrent batch calculations includes supplying the stream of intermediate data to another of the plurality of concurrent batch calculations. 13. The hardware accelerator engine method according to claim 11 , comprising: asserting a back pressure signal to control a flow rate of data received in one of the feature data buffer, the kernel data buffer, and the intermediate data buffer; and passing the back pressure signal through the stream switch to a source that is providing data for the one of the feature data buffer, the kernel data buffer, and the intermediate data buffer. 14. The hardware accelerator engine method according to claim 11 , comprising: at run time, configuring a layout of the feature data buffer according to a value in at least one configuration register. 15. The hardware accelerator engine method according to claim 14 , comprising: at runtime, after performing at least one batch calculation, re-configuring the layout of the feature data buffer. 16. A hardware accelerator engine that supports efficient mapping of convolutional stages of deep neural network algorithms, the hardware accelerator engine comprising: a stream switch having first and second stream switch input ports and a plurality of stream switch output ports, the stream switch being configurable during run time to selectively couple each of the stream switch input ports to any one or more of the plurality of stream switch output ports; and a plurality of convolution accelerators; wherein a first one of the convolution accelerators is operable, during run time and based on information processed from one or more of a kernel buffer and a feature line buffer during run time by the first convolution accelerator, to selectively reconfigure the feature line buffer from a firs

Assignees

Inventors

Classifications

  • G06N3/0464Primary

    Convolutional networks [CNN, ConvNet] · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Specially adapted for signal processing, e.g. Harvard architectures · CPC title

  • using switching circuits, e.g. switching matrix, connection or expansion network (G06F13/4009 takes precedence) · CPC title

  • Configuring for program initiating, e.g. using registry, configuration files · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12073308B2 cover?
Embodiments are directed towards a hardware accelerator engine that supports efficient mapping of convolutional stages of deep neural network algorithms. The hardware accelerator engine includes a plurality of convolution accelerators, and each one of the plurality of convolution accelerators includes a kernel buffer, a feature line buffer, and a plurality of multiply-accumulate (MAC) units. Th…
Who is the assignee on this patent?
St Microelectronics Srl, St Microelectronics Int Nv
What technology area does this patent fall under?
Primary CPC classification G06N3/0464. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).