Method and Apparatus for Designing and Implementing a Convolution Neural Net Accelerator
US-2017103298-A1 · Apr 13, 2017 · US
US11216722B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11216722-B2 |
| Application number | US-201615396520-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 31, 2016 |
| Priority date | Dec 31, 2016 |
| Publication date | Jan 4, 2022 |
| Grant date | Jan 4, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix operations and vector operations. The design framework module maps the operations of the flow graph to an accelerator hardware template, yielding an accelerator instance comprising register transfer language code that describes how one or more matrix processing units and one or more vector processing units are to be arranged to perform the RNN algorithm. At least one of the one or more MPUs, as part of implementing the RNN algorithm, is to directly provide or directly receive a value from one of the one or more VPUs.
Opening claim text (preview).
What is claimed is: 1. A method in a design framework module implemented by an electronic device for generating an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm comprising: obtaining, by the design framework module, a flow graph for the RNN algorithm, the flow graph identifying a plurality of operations to be performed to implement the RNN algorithm and further identifying data dependencies between ones of the plurality of operations, wherein the plurality of operations include one or more matrix operations and one or more vector operations; determining, by the design framework module, hardware components of one or more matrix processing units and one or more vector processing units of an accelerator hardware template; performing an automatic tuning to determine design parameters to use to customize the accelerator hardware template in order to optimize it for the flow graph; mapping, by the design framework module, the plurality of operations of the flow graph to the accelerator hardware template based on the determining of the hardware components and the determining of the design parameters to yield the accelerator instance comprising register transfer language (RTL) code that describes how the one or more matrix processing units and the one or more vector processing units are to be arranged to perform the RNN algorithm, wherein at least one of the one or more matrix processing units, as part of implementing the RNN algorithm, is to provide a value to one of the one or more vector processing units or receive the value from one of the one or more vector processing units; and validating performance and functionalities of the generated accelerator instance against one or more performance and functional models derived from hardware design constraints and optimization goals. 2. The method of claim 1 , wherein the obtaining comprises: computing, by the design framework module, the flow graph based upon a plurality of equations corresponding to the RNN algorithm. 3. The method of claim 1 , wherein the determining the hardware components comprises determining a number and type of adders and multipliers, and a number of pipeline stages and lane widths in the adders and the multipliers. 4. The method of claim 1 , wherein the mapping is based upon optimization goals indicating properties of the accelerator instance that should be optimized for. 5. The method of claim 1 , wherein the mapping is based upon one or more dataset properties identifying properties of input data for the RNN algorithm to be used with the accelerator instance. 6. The method of claim 1 , wherein the mapping further yields a compiler that is executable to program an accelerator, generated based upon the accelerator instance, to execute micro-code to implement the RNN algorithm. 7. The method of claim 6 , wherein the compiler is to program the accelerator by causing a control unit of the accelerator to execute at least some of the micro-code. 8. The method of claim 1 , further comprising at least one of: programming a Field Programmable Gate Array (FPGA), using the accelerator instance, to cause the FPGA to become operable to implement the RNN algorithm; or providing the RTL code to be used as an input to a logic synthesis tool to yield a circuit design for an Application-Specific Integrated Circuit (ASIC). 9. The method of claim 1 , wherein the RNN algorithm is either: a gated recurrent unit (GRU) RNN variant; or a long short term memory (LSTM) RNN variant. 10. A non-transitory machine readable storage medium having instructions which, when executed by one or more processors of a device, cause the device to implement a design framework module to generate an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm by performing operations comprising: obtaining a flow graph for the RNN algorithm, the flow graph identifying a plurality of operations to be performed to implement the RNN algorithm and further identifying data dependencies between ones of the plurality of operations, wherein the plurality of operations include one or more matrix operations and one or more vector operations; determining hardware components of one or more matrix processing units and one or more vector processing units of an accelerator hardware template; performing an automatic tuning to determine design parameters to use to customize the accelerator hardware template in order to optimize it for the flow graph; mapping the plurality of operations of the flow graph to the accelerator hardware template based on the determining of the hardware components and the determining of the design parameters to yield the accelerator instance comprising register transfer language (RTL) code that describes how the one or more matrix processing units and the one or more vector processing units are to be arranged to perform the RNN algorithm, wherein at least one of the one or more matrix processing units, as part of implementing the RNN algorithm, is to provide a value to one of the one or more vector processing units or receive the value from one of the one or more vector processing units; and validating performance and functionalities of the generated accelerator instance against one or more performance and functional models derived from hardware design constraints and optimization goals. 11. The non-transitory machine readable storage medium of claim 10 , wherein the obtaining comprises: computing the flow graph based upon a plurality of equations corresponding to the RNN algorithm. 12. The non-transitory machine readable storage medium of claim 10 , wherein the determining the hardware components comprises determining a number and type of adders and multipliers, and a number of pipeline stages and lane widths in the adders and the multipliers. 13. The non-transitory machine readable storage medium of claim 10 , wherein the mapping is based upon optimization goals indicating properties of the accelerator instance that should be optimized for. 14. The non-transitory machine readable storage medium of claim 10 , wherein the mapping is based upon one or more dataset properties identifying properties of input data for the RNN algorithm to be used with the accelerator instance. 15. The non-transitory machine readable storage medium of claim 10 , wherein the mapping further yields a compiler that is executable to program an accelerator, generated based upon the accelerator instance, to execute micro-code to implement the RNN algorithm. 16. The non-transitory machine readable storage medium of claim 15 , wherein the compiler is to program the accelerator by causing a control unit of the accelerator to execute at least some of the micro-code. 17. The non-transitory machine readable storage medium of claim 10 , wherein the operations further comprise at least one of: programming a Field Programmable Gate Array (FPGA), using the accelerator instance, to cause the FPGA to become operable to implement the RNN algorithm; or providing the RTL code to be used as an input to a logic synthesis tool to yield a circuit design for an Application-Specific Integrated Circuit (ASIC). 18. A device comprising: one or more processors; and one or more non-transitory machine readable storage media having instructions which, when executed by the one or more processors, cause the device to implement a design framework module that is to generate an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm by performing operations comprising: obtai
Recurrent networks, e.g. Hopfield networks · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
using electronic means · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.