What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Hardware accelerator template and design framework for implementing recurrent neural networks

US11216722B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11216722-B2
Application number	US-201615396520-A
Country	US
Kind code	B2
Filing date	Dec 31, 2016
Priority date	Dec 31, 2016
Publication date	Jan 4, 2022
Grant date	Jan 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix operations and vector operations. The design framework module maps the operations of the flow graph to an accelerator hardware template, yielding an accelerator instance comprising register transfer language code that describes how one or more matrix processing units and one or more vector processing units are to be arranged to perform the RNN algorithm. At least one of the one or more MPUs, as part of implementing the RNN algorithm, is to directly provide or directly receive a value from one of the one or more VPUs.

First claim

Opening claim text (preview).

What is claimed is: 1. A method in a design framework module implemented by an electronic device for generating an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm comprising: obtaining, by the design framework module, a flow graph for the RNN algorithm, the flow graph identifying a plurality of operations to be performed to implement the RNN algorithm and further identifying data dependencies between ones of the plurality of operations, wherein the plurality of operations include one or more matrix operations and one or more vector operations; determining, by the design framework module, hardware components of one or more matrix processing units and one or more vector processing units of an accelerator hardware template; performing an automatic tuning to determine design parameters to use to customize the accelerator hardware template in order to optimize it for the flow graph; mapping, by the design framework module, the plurality of operations of the flow graph to the accelerator hardware template based on the determining of the hardware components and the determining of the design parameters to yield the accelerator instance comprising register transfer language (RTL) code that describes how the one or more matrix processing units and the one or more vector processing units are to be arranged to perform the RNN algorithm, wherein at least one of the one or more matrix processing units, as part of implementing the RNN algorithm, is to provide a value to one of the one or more vector processing units or receive the value from one of the one or more vector processing units; and validating performance and functionalities of the generated accelerator instance against one or more performance and functional models derived from hardware design constraints and optimization goals. 2. The method of claim 1 , wherein the obtaining comprises: computing, by the design framework module, the flow graph based upon a plurality of equations corresponding to the RNN algorithm. 3. The method of claim 1 , wherein the determining the hardware components comprises determining a number and type of adders and multipliers, and a number of pipeline stages and lane widths in the adders and the multipliers. 4. The method of claim 1 , wherein the mapping is based upon optimization goals indicating properties of the accelerator instance that should be optimized for. 5. The method of claim 1 , wherein the mapping is based upon one or more dataset properties identifying properties of input data for the RNN algorithm to be used with the accelerator instance. 6. The method of claim 1 , wherein the mapping further yields a compiler that is executable to program an accelerator, generated based upon the accelerator instance, to execute micro-code to implement the RNN algorithm. 7. The method of claim 6 , wherein the compiler is to program the accelerator by causing a control unit of the accelerator to execute at least some of the micro-code. 8. The method of claim 1 , further comprising at least one of: programming a Field Programmable Gate Array (FPGA), using the accelerator instance, to cause the FPGA to become operable to implement the RNN algorithm; or providing the RTL code to be used as an input to a logic synthesis tool to yield a circuit design for an Application-Specific Integrated Circuit (ASIC). 9. The method of claim 1 , wherein the RNN algorithm is either: a gated recurrent unit (GRU) RNN variant; or a long short term memory (LSTM) RNN variant. 10. A non-transitory machine readable storage medium having instructions which, when executed by one or more processors of a device, cause the device to implement a design framework module to generate an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm by performing operations comprising: obtaining a flow graph for the RNN algorithm, the flow graph identifying a plurality of operations to be performed to implement the RNN algorithm and further identifying data dependencies between ones of the plurality of operations, wherein the plurality of operations include one or more matrix operations and one or more vector operations; determining hardware components of one or more matrix processing units and one or more vector processing units of an accelerator hardware template; performing an automatic tuning to determine design parameters to use to customize the accelerator hardware template in order to optimize it for the flow graph; mapping the plurality of operations of the flow graph to the accelerator hardware template based on the determining of the hardware components and the determining of the design parameters to yield the accelerator instance comprising register transfer language (RTL) code that describes how the one or more matrix processing units and the one or more vector processing units are to be arranged to perform the RNN algorithm, wherein at least one of the one or more matrix processing units, as part of implementing the RNN algorithm, is to provide a value to one of the one or more vector processing units or receive the value from one of the one or more vector processing units; and validating performance and functionalities of the generated accelerator instance against one or more performance and functional models derived from hardware design constraints and optimization goals. 11. The non-transitory machine readable storage medium of claim 10 , wherein the obtaining comprises: computing the flow graph based upon a plurality of equations corresponding to the RNN algorithm. 12. The non-transitory machine readable storage medium of claim 10 , wherein the determining the hardware components comprises determining a number and type of adders and multipliers, and a number of pipeline stages and lane widths in the adders and the multipliers. 13. The non-transitory machine readable storage medium of claim 10 , wherein the mapping is based upon optimization goals indicating properties of the accelerator instance that should be optimized for. 14. The non-transitory machine readable storage medium of claim 10 , wherein the mapping is based upon one or more dataset properties identifying properties of input data for the RNN algorithm to be used with the accelerator instance. 15. The non-transitory machine readable storage medium of claim 10 , wherein the mapping further yields a compiler that is executable to program an accelerator, generated based upon the accelerator instance, to execute micro-code to implement the RNN algorithm. 16. The non-transitory machine readable storage medium of claim 15 , wherein the compiler is to program the accelerator by causing a control unit of the accelerator to execute at least some of the micro-code. 17. The non-transitory machine readable storage medium of claim 10 , wherein the operations further comprise at least one of: programming a Field Programmable Gate Array (FPGA), using the accelerator instance, to cause the FPGA to become operable to implement the RNN algorithm; or providing the RTL code to be used as an input to a logic synthesis tool to yield a circuit design for an Application-Specific Integrated Circuit (ASIC). 18. A device comprising: one or more processors; and one or more non-transitory machine readable storage media having instructions which, when executed by the one or more processors, cause the device to implement a design framework module that is to generate an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm by performing operations comprising: obtai

Assignees

Intel Corp

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/063Primary
using electronic means · CPC title
G06N3/0445
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 60661846

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11216722B2 cover?: Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix op…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).