Processing element, neural processing device including same, and multiplication operation method using same

US12236209B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12236209-B2
Application numberUS-202318511942-A
CountryUS
Kind codeB2
Filing dateNov 16, 2023
Priority dateJun 17, 2021
Publication dateFeb 25, 2025
Grant dateFeb 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activation by using a first multiplier of a first precision or using both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal and a saturating adder configured to generate a partial sum by using the result data.

First claim

Opening claim text (preview).

What is claimed is: 1. A processing circuitry comprising: a weight register configured to store a weight; an input activation register configured to store an input activation; first and second multiplier circuits configured to generate partial multiplication groups by performing a multiplication operation of the weight stored in the weight register and the input activation stored in the input activation register; a digital aligning circuit configured to generate a first aligned partial multiplication group and a second aligned partial multiplication group by aligning first and second partial multiplication groups with same number of digits wherein the first aligned partial multiplication group is calculated by using a first booth reduction tree and the second aligned partial multiplication group is calculated by using a second booth reduction tree; first and second adder circuits configured to generate result data by using one of the partial multiplication groups or one of the first and the second aligned partial multiplication groups; and a third adder circuit configured to generate a partial sum by using the result data. 2. The processing circuitry of claim 1 , wherein the first adder circuit is configured to generate the result data using the partial multiplication groups, when a calculation mode signal is a first mode signal associated with a first precision, and the second adder circuit is configured to generate the result data using the first and the second aligned partial multiplication groups, when the calculation mode signal is a second mode signal associated with a second precision greater than the first precision. 3. The processing circuitry of claim 1 , wherein the first multiplier circuit is configured to generate the first partial multiplication group that has a first digit and the second multiplier circuit is configured to generate the second partial multiplication group that has a second digit different from the first digit, and wherein the digital aligning circuit is configured to generate the first aligned partial multiplication group by using the first partial multiplication group, and generate the second aligned partial multiplication group by using the second partial multiplication group. 4. The processing circuitry of claim 3 , wherein a depth of the first aligned partial multiplication group is greater than a depth of the second aligned partial multiplication group. 5. The processing circuitry of claim 3 , wherein a calculable depth of the first booth reduction tree is greater than a calculable depth of the second booth reduction tree. 6. The processing circuitry of claim 3 , wherein the first aligned partial multiplication group is calculated by using one first booth reduction tree, and the second aligned partial multiplication group is calculated by using a plurality of second booth reduction trees. 7. The processing circuitry of claim 6 , when the weight and the input activation are each 32-bit data, a first precision is INT4, and a second precision is INT8, the first aligned partial multiplication group is calculated by using one first booth reduction tree, and the second aligned partial multiplication group is calculated by using four second booth reduction trees. 8. The processing circuitry of claim 1 , wherein the result data is generated by using the first and the second booth reduction trees. 9. The processing circuitry of claim 8 , wherein the first and the second booth reduction trees are configured to: reduce depths of the partial multiplication groups, and perform an addition operation of the partial multiplication groups of which depths are reduced. 10. The processing circuitry of claim 1 , wherein the processing circuitry have k multipliers. 11. The processing circuitry of claim 10 , wherein k is 8 if the weight and the input activation are each 32-bit data. 12. The processing circuitry of claim 1 , further comprising a pre-adder configured to perform a pre-adding operation using an operation result of the second booth reduction tree. 13. The processing circuitry of claim 1 , wherein the processing circuitry further comprises a bit division logic circuit configured to generate a first divided weight of a first precision by using the weight and generate a first divided input activation of the first precision by using the input activation. 14. The processing circuitry of claim 13 , when a received calculation mode signal is a first mode signal associated with the first precision, the first adder circuit is configured to generate the result data by using the first divided weight and the first divided input activation. 15. The processing circuitry of claim 14 , when the received calculation mode signal is a second mode signal associated with a second precision greater than the first precision, the bit division logic circuit is configured to generate a first high-order divided weight and a first low-order divided weight by using the first divided weight, and generate a first high-order divided input activation and a first low-order divided input activation by using the first divided input activation. 16. The processing circuitry of claim 15 , wherein the first low-order divided weight and the first low-order divided input activation each comprise an extra bit for having a positive value. 17. A processor comprising processing circuitry comprising: a weight register configured to store a weight; an input activation register configured to store an input activation; first and second multiplier circuits configured to generate partial multiplication groups by performing a multiplication operation of the weight stored in the weight register and the input activation stored in the input activation register; a digital aligning circuit configured to generate a first aligned partial multiplication group and a second aligned partial multiplication group by aligning the partial multiplication groups with same number of digits when a calculation mode signal is first mode signal associated with a first precision wherein the first aligned partial multiplication group is calculated by using a first booth reduction tree and the second aligned partial multiplication group is calculated by using a second booth reduction tree; a first adder circuit configured to generate result data by using one of the first and the second aligned partial multiplication groups with the first mode signal or a second adder circuit configured to generate the result data by using one of the partial multiplication groups with a second mode signal associated with a second precision smaller than the first precision; and a third adder circuit configured to generate a partial sum by using the result data. 18. The processor of claim 17 , wherein the first and the second adder circuits are configured to generate the result data by performing an addition operation of a plurality of partial multiplication groups by using a Booth algorithm. 19. The processor of claim 17 , wherein the digital aligning circuit is configured to group the plurality of partial multiplication groups into a plurality of aligned partial multiplication groups based on a number of digits, and the first and the second adder circuits are configured to generate the result data by performing an addition operation on the plurality of aligned partial multiplication groups. 20. An operation method of a processing circuitry comprising a weight register, an input activation register, a first multiplier circuit, a second multiplier circuit, a digit aligning circu

Assignees

Inventors

Classifications

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Activation functions · CPC title

  • using electronic means · CPC title

  • G06F7/533Primary

    Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12236209B2 cover?
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activ…
Who is the assignee on this patent?
Rebellions Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/533. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).