Processing element and neural processing device including same

US2022374691A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022374691-A1
Application numberUS-202217664393-A
CountryUS
Kind codeA1
Filing dateMay 20, 2022
Priority dateMay 24, 2021
Publication dateNov 24, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store an input activation, a flexible multiplier configured to receive a first sub-weight of a first precision included in the weight, receive a first sub-input activation of the first precision included in the input activation, and generate result data by performing multiplication calculation of the first sub-weight and the first sub-input activation as the first precision or a second precision different from the first precision according to the first sub-weight and the first sub-input activation and a saturating adder configured to generate a partial sum by using the result data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processing element comprising: a weight register configured to store a weight; an input activation register configured to store an input activation; a flexible multiplier configured to receive a first sub-weight of a first precision included in the weight, receive a first sub-input activation of the first precision included in the input activation, and generate result data by performing multiplication calculation of the first sub-weight and the first sub-input activation as the first precision or a second precision different from the first precision according to the first sub-weight and the first sub-input activation; and a saturating adder configured to generate a partial sum by using the result data. 2 . The processing element of claim 1 , wherein the flexible multiplier includes a path determination unit configured to generate a path determination signal based on the first sub-weight and the first sub-input activation, a first multiplier configured to perform multiplication calculation with the first precision, a second multiplier configured to perform multiplication calculation with the second precision, and a demultiplexer configured to provide any one of the first multiplier and the second multiplier with the first sub-weight and the first sub-input activation in response to the path determination signal. 3 . The processing element of claim 2 , wherein the path determination unit generates the path determination signal as a first signal for providing the first sub-weight and the first sub-input activation to the first multiplier if a size of at least one of the first sub-weight and the first sub-input activation is greater than a predetermined first size, and generates the path determination signal as a second signal for providing the first sub-weight and the first sub-input activation to the second multiplier if a size of each of the first sub-weight and the first sub-input activation is less than or equal to the first size. 4 . The processing element of claim 2 , wherein the path determination unit includes a bit division logic configured to generate the first sub-weight by dividing the weight into a unit of the first precision or the second precision and generate the first sub-input activation by dividing the input activation into a unit of the first precision or the second precision in response to the calculation mode signal, a path selection logic configured to generate the path determination signal based on the calculation mode signal, the first sub-weight, and the first sub-input activation, and a conversion logic configured to convert precisions of the first sub-weight and the first sub-input activation. 5 . The processing element of claim 2 , wherein the number of the first multipliers is k, and the number of the second multipliers is 2k, where k is a natural number. 6 . The processing element of claim 2 , wherein the first precision has 2N bits, and the second precision has N bits, where N is a natural number. 7 . The processing element of claim 6 , wherein the first precision is INT4 and the second precision is INT2. 8 . The processing element of claim 2 , wherein the weight includes the first sub-weight and the second sub-weight, the input activation includes the first sub-input activation and the second sub-input activation, the flexible multiplier generates a first path determination signal based on the first sub-weight and the first sub-input activation, and generates a second path determination signal based on the second sub-weight and the second sub-input activation, and the first path determination signal and the second path determination signal are independently generated. 9 . The processing element of claim 2 , wherein the weight includes the first sub-weight and the second sub-weight, the input activation includes the first sub-input activation and the second sub-input activation, and the flexible multiplier generates the path determination signal based on the first sub-weight, the second sub-weight, the first sub-input activation, and the second sub-input activation. 10 . The processing element of claim 1 , wherein the flexible multiplier includes a control pipeline configured to synchronize reception of the first sub-weight and the first sub-input activation with generation of the result data. 11 . A processing element comprising: a weight register configured to store a weight; an input activation register configured to store an input activation; a flexible multiplier configured to generate result data by performing multiplication calculation of the weight and the input activation as the first precision or a second precision different from the first precision based on a calculation mode signal; and a saturating adder configured to generate a partial sum by using the result data. 12 . The processing element of claim 11 , wherein the flexible multiplier includes an error detection logic configured to generate a detection result by checking whether overflow or underflow occurs according to multiplication calculation of the weight and the input activation, k first multipliers of the first precision, 2k second multipliers of the second precision, and a path selection logic configured to select any one of the first multiplier and the second multiplier based on sizes of the weight and the input activation. 13 . The processing element of claim 12 , wherein the path selection logic selects any one of the first multiplier and the second multiplier based on whether at least one of the weight and the input activation is greater than a greatest value of the second precision, if the calculation mode signal is associated with the first precision. 14 . The processing element of claim 13 , wherein the error detection logic generates a first result if overflow or underflow occurs in multiplication calculation of the weight and the input activation and generates a second result if overflow or underflow does not occur in the multiplication calculation of the weight and the input activation, and in a case where each of the weight and the input activation is less than the greatest value of the second precision, the path selection logic selects the first multiplier if the detection result is the first result and selects the second multiplier if the detection result is the second result. 15 . The processing element of claim 12 , wherein the path selection logic selects any one of the first multiplier and the second multiplier according to the detection result when the calculation mode signal is associated with the second precision. 16 . The processing element of claim 15 , wherein the error detection logic generates a first result if overflow or underflow occurs in the multiplication calculation of the weight and the input activation and generates a second result if the overflow or the underflow does not occur in the multiplication calculation of the weight and the input activation, and the path selection logic selects the first multiplier if the detection result is the first result and selects the second multiplier if the detection result is the second result. 17 . A neural processing device comprising: at least one neural core, wherein the neural core includes a processing unit configured to perform calculation, and a L0 memory configured to store input/output data of the processing unit, the processing unit includes a PE array including at least one processing element, and the PE array includes a flexible multiplier configured to receive a weight and an input act

Assignees

Inventors

Classifications

  • Reconfigurable for different fixed word lengths · CPC title

  • G06F7/53Primary

    in parallel-parallel fashion, i.e. both operands being entered in parallel (G06F7/533 takes precedence) · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Multiplying only · CPC title

  • comprising an array of processing units with common control, e.g. single instruction multiple data processors (G06F15/82 takes precedence {; for correlation function computation G06F17/15}) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022374691A1 cover?
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store an input activation, a flexible multiplier configured to receive a first sub-weight of a first precision included in the weight, receive a first sub-…
Who is the assignee on this patent?
Rebellions Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/53. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 24 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).