System and method for accelerating RNN network, and storage medium

US11775803B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11775803-B2
Application numberUS-202118012938-A
CountryUS
Kind codeB2
Filing dateApr 26, 2021
Priority dateSep 25, 2020
Publication dateOct 3, 2023
Grant dateOct 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for accelerating an RNN network including: a first cache, which is used for outputting Wx1 to WxN or Wh1 to WhN in parallel in N paths in a cyclic switching manner, and the degree of parallelism is k; a second cache, which is used for outputting xt or ht-1 in the cyclic switching manner; a vector multiplication circuit, which is used for, by using N groups of multiplication arrays, respectively calculating Wx1xt to WxNxt, or respectively calculating Wh1ht-1 to WhNht-1; an addition circuit, which is used for calculating Wx1xt+Wh1ht-1+b1 to WxNxt+WhNht-1+bN; an activation circuit, which is used for performing an activation operation according to an output of the addition circuit; a state updating circuit, which is used for acquiring ct-1, calculating ct and ht, updating ct-1, and sending ht to the second cache; a bias data cache; a vector cache; and a cell state cache.

First claim

Opening claim text (preview).

The invention claimed is: 1. A recurrent neural network accelerating system, comprising a first cache, a second cache, a vector multiplication circuit, an addition circuit, an activation circuit, a state updating circuit, a biased data cache, a vector cache and a Cell state cache, wherein, the first cache is configured to circularly switch between a first state and a second state, output W x1 to W xN in N paths in the first state in parallel with degrees of parallelism of k, and output W h1 to W hN in N paths in the second state in parallel with degrees of parallelism of k, wherein N is a positive integer greater than or equal to 2; the second cache is configured to circularly switch between the first state and the second state, output x t in the first state, and output h t-1 in the second state; the vector multiplication circuit is configured to use N groups of multiplication arrays to respectively calculate W x1 x t to W xN x t when receiving W x1 to W xN output by the first cache, and use the N groups of multiplication arrays to respectively calculate W h1 h t-1 to W hN h t-1 when receiving W h1 to W hN output by the first cache, wherein the vector multiplication circuit comprises the N groups of multiplication arrays, each group of the multiplication arrays comprise k multiplication units; the addition circuit is configured to receive b 1 to b N sent by a biased data cache, and use a vector cache to realize the calculation of W x1 x t +W h1 h t-1 +b 1 to W xN x t +W hN h t-1 +b N ; the activation circuit is configured to perform activation operation according to an output of the addition circuit; the state updating circuit is configured to obtain c t-1 from a Cell state cache, calculate c t and h t according to the output of the activation circuit, use c t to update c t-1 in the Cell state cache after calculating out c t , and send h t to the second cache; wherein W x1 to W xN sequentially represent weight data matrixes of a first gate to a N-th gate, W h1 to W hN sequentially represent hidden state weight data matrixes of the first gate to the N-th gate, b 1 to b N sequentially represent biased data of the first gate to the N-th gate, x t represents input data at time t, h t-1 represents hidden state data at time t−1, h t represents hidden state data at time t, c t represents a Cell state at time t, and c t-1 represents a Cell state at time t−1. 2. The recurrent neural network accelerating system according to claim 1 , wherein the recurrent neural network is a long short-term memory network, wherein N=4, and the system comprises: the first cache, configured to circularly switch between the first state and the second state, output W xi , W xf , W xo , and W xc in four paths in the first state in parallel with all degrees of parallelism of k, and output W hi , W hf , W ho , and W hc in four paths in the second state in parallel with all degrees of parallelism of k; the second cache, configured to circularly switch between the first state and the second state, output x t in the first state, and output h t-1 in the second state; the vector multiplication circuit, configured to use four groups of multiplication arrays to respectively calculate W xi x t , W xf x t , W xo x t and W xc x t when receiving W xi , W xf , W xo , and W xc output by the first cache, and use the four groups of multiplication arrays to respectively calculate W hi h t-1 , W hf h t-1 , W ho h t-1 , and W hc h t-1 when receiving W hi , W hf , W ho , and W hc output by the first cache, and comprising the four groups of multiplication arrays, wherein each group of the multiplication arrays comprise k multiplication units; the addition circuit, configured to receive b i , b f , b o , and b c sent by the biased data cache, and use the vector cache to realize the calculation of W xi x t +W hi h t-1 +b i , W xf x t +W hf h t-1 +b f , W xo x t +W ho h t-1 +b o , and W xc x t +W hc h t-1 +b c , the activation circuit, configured to perform activation operation according to the output of the addition circuit, and output i t , f t , o t , and {tilde over (c)} t ; the state updating circuit, configured to obtain c t-1 from the Cell state cache, calculate c t and h t according to the output of the activation circuit, use c t to update c t-1 in the Cell state cache after calculating out c t , and send h t to the second cache; wherein W xi , W xf , W xo , and W xc sequentially represent a weight data matrix of an input gate, a weight data matrix of a forget gate, a weight data matrix of an output gate, and a weight data matrix of a Cell gate, W hi , W hf , W ho , and W hc sequentially represent a hidden state weight data matrix of the input gate, a hidden state weight data matrix of the forget gate, a hidden state weight data matrix of the output gate, and a hidden state weight data matrix of the Cell gate, b i , b f , b o , and b c sequentially represent biased data of the input gate, biased data of the forget gate, biased data of the output gate, and biased data of the Cell gate, i t , f t , o t , and {tilde over (c)} t sequentially represent the input gate, the forget gate, the output gate, and the Cell gate, x t represents input data at time t, h t-1 represents hidden state data at time t−1, h t represents hidden state data at time t, c t represents the Cell state at time t, and c t-1 represents the Cell state at time t−1. 3. The recurrent neural network accelerating system according to claim 2 , wherein the vector multiplication circuit is in a first flow line, the addition circuit is in a second flow line, the activation circuit and the state updating circuit are in a third flow line, and the first flow line, the second flow line and the third flow line run in parallel. 4. The recurrent neural network accelerating system according to claim 2 , wherein the first cache comprises a first storage unit, a second storage unit, a first multiplexer, a first memory, a second memory, a third memory and a fourth memory and a data classifier, wherein, the first storage unit is configured to obtain a target number of W xi , a target number of W xf , a target number of W xo , and a target number of W xc from an off-chip storage; the second storage unit is configured to obtain a target number of W hi , a target number of W hf , a target number of W ho , and a target number of W hc from the off-chip storage; the first multiplexer is connected to the first storage unit and the second storage unit respectively, configured to realize the circular switching between the first state and the second state, and select the first storage unit for data output in the first state, and select the second storage unit for data output in the second state; the first memory, the second memory, the third memory and the fourth memory are all connected to the first multiplexer through a data classifier, and configured to output W xi , W xf , W xo , and W xc sequentially in parallel with all degrees of parallelism of k when the first multiplexer is in the first state, and configured to output W hi , W hf , W ho , and W hc sequentially in parallel with all degrees of parallelism of k when the first multiplexer is in the second state; wherein the target number is greater than k. 5. The recurrent neural network accelerating system according to claim 4 , wherein the first storage unit and the second storage unit both adopt a first clock, the first memory, the second memory, the third memory and the fourth memory all adopt a second clock, and the first clock and the second clock are independent of each other, so that when the output rate of any one of the first memory, the second memory, the third memory and the fourth memory is lower than the input rate, unsent data is cached in the memory. 6. The recurrent neur

Assignees

Inventors

Classifications

  • G06N3/0442Primary

    characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs · CPC title

  • using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11775803B2 cover?
A system for accelerating an RNN network including: a first cache, which is used for outputting Wx1 to WxN or Wh1 to WhN in parallel in N paths in a cyclic switching manner, and the degree of parallelism is k; a second cache, which is used for outputting xt or ht-1 in the cyclic switching manner; a vector multiplication circuit, which is used for, by using N groups of multiplication arrays, res…
Who is the assignee on this patent?
Inspur Suzhou Intelligent Technology Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/0442. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).