Processing element, neural processing device including same, and method for calculating thereof
US-2022300794-A1 · Sep 22, 2022 · US
US2022374691A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022374691-A1 |
| Application number | US-202217664393-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 20, 2022 |
| Priority date | May 24, 2021 |
| Publication date | Nov 24, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store an input activation, a flexible multiplier configured to receive a first sub-weight of a first precision included in the weight, receive a first sub-input activation of the first precision included in the input activation, and generate result data by performing multiplication calculation of the first sub-weight and the first sub-input activation as the first precision or a second precision different from the first precision according to the first sub-weight and the first sub-input activation and a saturating adder configured to generate a partial sum by using the result data.
Opening claim text (preview).
What is claimed is: 1 . A processing element comprising: a weight register configured to store a weight; an input activation register configured to store an input activation; a flexible multiplier configured to receive a first sub-weight of a first precision included in the weight, receive a first sub-input activation of the first precision included in the input activation, and generate result data by performing multiplication calculation of the first sub-weight and the first sub-input activation as the first precision or a second precision different from the first precision according to the first sub-weight and the first sub-input activation; and a saturating adder configured to generate a partial sum by using the result data. 2 . The processing element of claim 1 , wherein the flexible multiplier includes a path determination unit configured to generate a path determination signal based on the first sub-weight and the first sub-input activation, a first multiplier configured to perform multiplication calculation with the first precision, a second multiplier configured to perform multiplication calculation with the second precision, and a demultiplexer configured to provide any one of the first multiplier and the second multiplier with the first sub-weight and the first sub-input activation in response to the path determination signal. 3 . The processing element of claim 2 , wherein the path determination unit generates the path determination signal as a first signal for providing the first sub-weight and the first sub-input activation to the first multiplier if a size of at least one of the first sub-weight and the first sub-input activation is greater than a predetermined first size, and generates the path determination signal as a second signal for providing the first sub-weight and the first sub-input activation to the second multiplier if a size of each of the first sub-weight and the first sub-input activation is less than or equal to the first size. 4 . The processing element of claim 2 , wherein the path determination unit includes a bit division logic configured to generate the first sub-weight by dividing the weight into a unit of the first precision or the second precision and generate the first sub-input activation by dividing the input activation into a unit of the first precision or the second precision in response to the calculation mode signal, a path selection logic configured to generate the path determination signal based on the calculation mode signal, the first sub-weight, and the first sub-input activation, and a conversion logic configured to convert precisions of the first sub-weight and the first sub-input activation. 5 . The processing element of claim 2 , wherein the number of the first multipliers is k, and the number of the second multipliers is 2k, where k is a natural number. 6 . The processing element of claim 2 , wherein the first precision has 2N bits, and the second precision has N bits, where N is a natural number. 7 . The processing element of claim 6 , wherein the first precision is INT4 and the second precision is INT2. 8 . The processing element of claim 2 , wherein the weight includes the first sub-weight and the second sub-weight, the input activation includes the first sub-input activation and the second sub-input activation, the flexible multiplier generates a first path determination signal based on the first sub-weight and the first sub-input activation, and generates a second path determination signal based on the second sub-weight and the second sub-input activation, and the first path determination signal and the second path determination signal are independently generated. 9 . The processing element of claim 2 , wherein the weight includes the first sub-weight and the second sub-weight, the input activation includes the first sub-input activation and the second sub-input activation, and the flexible multiplier generates the path determination signal based on the first sub-weight, the second sub-weight, the first sub-input activation, and the second sub-input activation. 10 . The processing element of claim 1 , wherein the flexible multiplier includes a control pipeline configured to synchronize reception of the first sub-weight and the first sub-input activation with generation of the result data. 11 . A processing element comprising: a weight register configured to store a weight; an input activation register configured to store an input activation; a flexible multiplier configured to generate result data by performing multiplication calculation of the weight and the input activation as the first precision or a second precision different from the first precision based on a calculation mode signal; and a saturating adder configured to generate a partial sum by using the result data. 12 . The processing element of claim 11 , wherein the flexible multiplier includes an error detection logic configured to generate a detection result by checking whether overflow or underflow occurs according to multiplication calculation of the weight and the input activation, k first multipliers of the first precision, 2k second multipliers of the second precision, and a path selection logic configured to select any one of the first multiplier and the second multiplier based on sizes of the weight and the input activation. 13 . The processing element of claim 12 , wherein the path selection logic selects any one of the first multiplier and the second multiplier based on whether at least one of the weight and the input activation is greater than a greatest value of the second precision, if the calculation mode signal is associated with the first precision. 14 . The processing element of claim 13 , wherein the error detection logic generates a first result if overflow or underflow occurs in multiplication calculation of the weight and the input activation and generates a second result if overflow or underflow does not occur in the multiplication calculation of the weight and the input activation, and in a case where each of the weight and the input activation is less than the greatest value of the second precision, the path selection logic selects the first multiplier if the detection result is the first result and selects the second multiplier if the detection result is the second result. 15 . The processing element of claim 12 , wherein the path selection logic selects any one of the first multiplier and the second multiplier according to the detection result when the calculation mode signal is associated with the second precision. 16 . The processing element of claim 15 , wherein the error detection logic generates a first result if overflow or underflow occurs in the multiplication calculation of the weight and the input activation and generates a second result if the overflow or the underflow does not occur in the multiplication calculation of the weight and the input activation, and the path selection logic selects the first multiplier if the detection result is the first result and selects the second multiplier if the detection result is the second result. 17 . A neural processing device comprising: at least one neural core, wherein the neural core includes a processing unit configured to perform calculation, and a L0 memory configured to store input/output data of the processing unit, the processing unit includes a PE array including at least one processing element, and the PE array includes a flexible multiplier configured to receive a weight and an input act
Reconfigurable for different fixed word lengths · CPC title
in parallel-parallel fashion, i.e. both operands being entered in parallel (G06F7/533 takes precedence) · CPC title
using electronic means · CPC title
Multiplying only · CPC title
comprising an array of processing units with common control, e.g. single instruction multiple data processors (G06F15/82 takes precedence {; for correlation function computation G06F17/15}) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.