Processing element, neural processing device including same, and multiplication operation method using same

US2022405560A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022405560-A1
Application numberUS-202217807082-A
CountryUS
Kind codeA1
Filing dateJun 15, 2022
Priority dateJun 17, 2021
Publication dateDec 22, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activation by using a first multiplier of a first precision or using both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal and a saturating adder configured to generate a partial sum by using the result data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processing element comprising: a weight register configured to store a weight; an input activation register configured to store input activation; a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activation by using a first multiplier of a first precision or using both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal; and a saturating adder configured to generate a partial sum by using the result data. 2 . The processing element of claim 1 , wherein the flexible multiplier performs a multiplication operation of the weight and the input activation by using the first multiplier when the calculation mode signal is a first mode signal associated with the first precision, and performs the multiplication operation of the weight and the input activation by using both the first multiplier and the second multiplier when the calculation mode signal is a second mode signal associated with a second precision greater than the first precision. 3 . The processing element of claim 2 , wherein the flexible multiplier comprises an aligner that a first aligned partial multiplication group and a second aligned partial multiplication group by aligning digits of the first partial multiplication group generated by the first multiplier and a second partial multiplication group generated by the second multiplier when the calculation mode signal is the second mode signal. 4 . The processing element of claim 3 , wherein the flexible multiplier comprises a first booth reduction tree configured to calculate the first aligned partial multiplication group, and a second booth reduction tree configured to calculate the second aligned partial multiplication group, and a depth of the first aligned partial multiplication group is greater than a depth of the second aligned partial multiplication group. 5 . The processing element of claim 3 , wherein the flexible multiplier comprises a first booth reduction tree configured to calculate the first aligned partial multiplication group, and a second booth reduction tree configured to calculate the second aligned partial multiplication group, and a calculable depth of the first booth reduction tree is greater than a calculable depth of the second booth reduction tree. 6 . The processing element of claim 3 , wherein the flexible multiplier comprises one first booth reduction tree configured to calculate the first aligned partial multiplication group, and a plurality of second booth reduction trees configured to calculate the second aligned partial multiplication group. 7 . The processing element of claim 3 , wherein, when the weight and the input activation are each 32-bit data, the first precision is INT4, and the second precision is INT8, the flexible multiplier comprises one first booth reduction tree that calculates the first aligned partial multiplication group, and four second booth reduction trees that calculate the second aligned partial multiplication group. 8 . The processing element of claim 1 , wherein the flexible multiplier comprises a booth reduction tree that generates the result data by using partial multiplication groups generated by the first multiplier and the second multiplier. 9 . The processing element of claim 8 , wherein the booth reduction tree comprises a depth reducer that reduces depths of the partial multiplication groups, and an adder that performs an addition operation of the partial multiplication groups of which depths are reduced by the depth reducer. 10 . The processing element of claim 1 , wherein each of the first multiplier and the second multiplier is composed of k multipliers. 11 . The processing element of claim 10 , wherein k is 8 if the weight and the input activation are each 32-bit data, the first precision is INT4, and the second precision is INT8. 12 . The processing element of claim 1 , wherein the flexible multiplier comprises an aligner that generates a first aligned partial multiplication group and a second aligned partial multiplication group by using partial multiplication groups generated by the first multiplier and the second multiplier, a first booth reduction tree that calculates the first aligned partial multiplication group, a second booth reduction tree that calculates the second aligned partial multiplication group, and a pre-adder that performs an addition operation on an operation result of the second booth reduction tree, and a calculation result of the first booth reduction tree and a calculation result of the pre-adder are provided to the saturating adder. 13 . The processing element of claim 1 , wherein the flexible multiplier comprises a bit division logic that generates a first divided weight of the first precision by using the weight and generates a first divided input activation of the first precision by using the input activation. 14 . The processing element of claim 13 , wherein, when the calculation mode signal is a first mode signal associated with the first precision, the first multiplier generates the result data by using the first divided weight and the first divided input activation. 15 . The processing element of claim 13 , wherein, when the calculation mode signal is a second mode signal associated with a second precision greater than the first precision, the bit division logic generates a first high-order divided weight and a first low-order divided weight by using the first divided weight, and generates a first high-order divided input activation and a first low-order divided input activation by using the first divided input activation. 16 . The processing element of claim 15 , wherein the first low-order divided weight and the first low-order divided input activation each comprise an extra bit for having a positive value. 17 . A neural processing device comprising: at least one neural core, wherein the neural core comprises a processing unit that performs calculation, and a L0 memory for storing input/output data of the processing unit, the processing unit comprises a PE array including at least one processing element, and the PE array comprises a flexible multiplier that receives a weight and an input activation and generates a plurality of partial multiplication groups by using a first multiplier of a first precision or both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal and generates result data by using the plurality of partial multiplication groups, and a saturating adder that receives the result data and generates a partial sum. 18 . The neural processing device of claim 17 , wherein the flexible multiplier generates the result data by performing an addition operation of the plurality of partial multiplication groups by using a Booth algorithm. 19 . The neural processing device of claim 17 , wherein the flexible multiplier groups the plurality of partial multiplication groups into a plurality of aligned partial multiplication groups based on digits thereof, and generates the result data by performing an addition operation on the plurality of aligned partial multiplication groups. 20 . A multiplication operation method comprising: receiving a weight, an input activation, and a calculation mode signal; generating a plurality of divided weights by using the weight; generating a plurality of divided input activations by using the input activation; determi

Assignees

Inventors

Classifications

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • G06F7/533Primary

    Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even · CPC title

  • Dividing only · CPC title

  • Adding; Subtracting (G06F7/483 - G06F7/491, G06F7/544 - G06F7/556 take precedence) · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022405560A1 cover?
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activ…
Who is the assignee on this patent?
Rebellions Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/533. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).