Systolic array with efficient input reduction and extended array performance

US11880682B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11880682-B2
Application numberUS-202117363894-A
CountryUS
Kind codeB2
Filing dateJun 30, 2021
Priority dateJun 30, 2021
Publication dateJan 23, 2024
Grant dateJan 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.

First claim

Opening claim text (preview).

What is claimed is: 1. A systolic array processor organized in rows and columns, each row comprising: a reducer, the reducer configured to convert 32-bit input data elements into reduced 22-bit input data elements, the reducer comprising: a trailing bit reducer configured to reduce a quantity of bits representing a significand portion of a 32-bit input data element of the 32-bit input data elements to produce a reduced significand portion of the 32-bit input data element; a rounder configured to round the reduced significand portion of the 32-bit input data element to produce a rounded significand portion; and an exponent expander configured to increase a quantity of bits representing an exponent portion of the 32-bit input data element to produce an increased exponent portion, wherein the reducer produces a reduced 22-bit input data element based on the rounded significand portion and the increased exponent portion; and a plurality of processing elements, the plurality of processing elements configured to receive the reduced 22-bit input data elements from the reducer and to receive weights for performing multiply-accumulate operations. 2. The systolic array processor of claim 1 , wherein the reducer is further configured to convert 32-bit weights into the weights. 3. The systolic array processor of claim 1 , wherein the reducer further comprises a first reducer, each row further comprising: a second reducer, the second reducer configured to convert 32-bit weights into the weights. 4. The systolic array processor of claim 1 , wherein the rounder is configured to round the reduced significand portion of the 32-bit input data element based on one or more of: stochastic rounding; rounding to nearest even; rounding to zero; rounding down; or rounding up. 5. A systolic circuit comprising: a group of processing elements arranged into a plurality of rows; and a first convertor configured to: receive a first input represented in floating-point with a first bit-length; identify a quantity of trailing bits of the first input based on a difference between the first bit-length and a bit-length supported by the group of processing elements; reduce the quantity of trailing bits of the first input; and generate a first reduced input represented in floating-point with a second bit-length based on reducing the quantity of trailing bits of the first input, wherein the second bit-length is less than the first bit-length, wherein the second bit-length corresponds to a bit-length supported by the group of processing elements; wherein an individual processing element in at least one row of the group of processing elements is configured to receive the first reduced input from the first convertor and to receive a second input for performing multiply-accumulate operations. 6. The systolic circuit of claim 5 , wherein individual processing elements in the plurality of rows of the group of processing elements comprise: a multiplier configured to multiply two 22-bit floating-point numbers, wherein the multiplier is comprised of a 1-bit sign data path, a 11-bit significand data path, and a 10-bit exponent data path; and an adder configured to add two floating-point numbers, wherein the adder is comprised of a 1-bit sign data path, a 23-bit significand data path, and a 10-bit exponent data path. 7. The systolic circuit of claim 5 , wherein the first input comprises an input data element and the second input comprises a reduced weight, wherein the first convertor is further configured to: receive the first input and a weight; generate the first reduced input and the second input; and select the first reduced input or the second input to be provided. 8. The systolic circuit of claim 5 , wherein the first convertor comprises: a trailing bit reducer configured to reduce a quantity of bits representing a significand portion of the first input to produce a reduced significand portion of the first input; a rounder configured to round the reduced significand portion of the first input based on a remainder of the bits representing the significand portion of the first input not included within the reduced significand portion; and an exponent expander configured to increase a quantity of bits representing an exponent portion of the first input. 9. The systolic circuit of claim 5 , wherein the first input comprises a first rounded input, wherein the first convertor comprises: a trailing bit reducer configured to reduce a quantity of bits representing a significand portion of the first input to produce a reduced significand portion of the first input; and an exponent expander configured to increase a quantity of bits representing an exponent portion of the first input. 10. The systolic circuit of claim 5 , wherein the first reduced input comprises a first reduced rounded input, wherein the first reduced rounded input is rounded based on one or more of: stochastic rounding; rounding to nearest even; rounding to zero; rounding down; or rounding up. 11. The systolic circuit of claim 5 , wherein the first reduced input comprises a first reduced rounded input, wherein the first reduced rounded input is rounded based on a user input. 12. The systolic circuit of claim 5 , wherein: the first convertor is configured to convert 32-bit floating-point numbers to 22-bit floating-point numbers, wherein each of the processing elements comprises: a 22-bit multiplier; and a 34-bit adder. 13. The systolic circuit of claim 5 , wherein: the first convertor is further configured to convert m-bit floating-point numbers to n-bit floating-point numbers, wherein n and m can be any positive integer, wherein n is less than m, wherein each of the processing elements comprises: a multiplier configured to multiply at least two n-bit numbers; and an adder configured to add two p-bit numbers, wherein p is greater than n. 14. The systolic circuit of claim 5 , wherein to reduce the quantity of trailing bits of the first input, the first convertor is configured to: set the quantity of trailing bits to zero. 15. The systolic circuit of claim 5 , further comprising: a second convertor configured to: receive a weight represented in floating-point with the first bit-length; identify a quantity of trailing bits of the weight; reduce the quantity of trailing bits of the weight; and generate the second input represented in floating-point with the second bit-length based on reducing the quantity of trailing bits of the weight. 16. The systolic circuit of claim 5 , wherein the first reduced input is stored in a 24-bit format. 17. A method implemented by a systolic circuit, the method comprising: receiving a first input represented in floating-point with a first bit-length; reducing a quantity of trailing bits of the first input based on a difference between the first bit-length and a bit-length supported by a processing element of the systolic circuit for multiply-accumulate operations; generating a first reduced input represented in floating-point with a second bit-length based on reducing the quantity of trailing bits of the first input, wherein the second bit-length is less than the first bit-length, wherein the second bit-length corresponds to the bit-length supported by the processing element; and receiving the first reduced input and a second input for performing, by the processing element, multiply-accumulate operations. 18. The method of claim 17 , wherein: the first input comprises a 32-bit floating-point number; the first reduced input c

Assignees

Inventors

Classifications

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • Systolic arrays · CPC title

  • G06F7/544Primary

    for evaluating functions by calculation {(G06F7/4824 takes precedence)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11880682B2 cover?
Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).