Support for different matrix multiplications by selecting adder tree intermediate results

US11520854B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11520854-B2
Application numberUS-201916667700-A
CountryUS
Kind codeB2
Filing dateOct 29, 2019
Priority dateOct 29, 2019
Publication dateDec 6, 2022
Grant dateDec 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix. A control unit is used to instruct the matrix multiplication hardware unit to perform a plurality of different matrix multiplications in parallel by using a combined matrix that includes elements of a plurality of different operand matrices and utilize one or more selected ones of the intermediate results of the hierarchical tree of adders for use in determining the output result matrix that includes different groups of elements representing different multiplication results corresponding to different ones of the different operand matrices.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a matrix multiplication hardware unit configured to perform matrix multiplications of a first size, comprising: a plurality of multipliers configured to element-wise multiply a first group of elements with a second group of elements; and a hierarchical tree of adders configured to add together results of the plurality of multipliers and selectively provide a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders for use in determining an output result matrix, wherein the hierarchical tree of adders includes a demultiplexer configured to receive an output of an adder in a next-to-last level of adders in the hierarchical tree of adders, wherein the demultiplexer is configured to provide an output that is an input to an adder in a last level of adders in the hierarchical tree of adders; and a control unit configured to instruct the matrix multiplication hardware unit to perform a plurality of different matrix multiplications of a second size smaller than the first size in parallel by using a combined matrix that includes elements of a plurality of different operand matrices and utilize one or more selected ones of the intermediate results of the hierarchical tree of adders for use in determining the output result matrix that includes different groups of elements representing different multiplication results corresponding to different ones of the different operand matrices. 2. The system of claim 1 , wherein the first group of elements is associated with convolution filter values and the second group of elements is associated with input values to be convolved with the convolution filter values, or vice versa. 3. The system of claim 1 , wherein the first group of elements and the second group of elements include color channel data. 4. The system of claim 1 , wherein the first group of elements and the second group of elements are loaded from a memory and stored in registers. 5. The system of claim 1 , wherein the first group of elements and the second group of elements include data associated with a neural network computation. 6. The system of claim 1 , wherein the plurality of multipliers includes thirty-two multipliers. 7. The system of claim 1 , wherein the hierarchical tree of adders includes a number of hierarchical levels of adders equal to logarithm base two of the number of multipliers in the plurality of multipliers. 8. The system of claim 1 , wherein the hierarchical tree of adders includes five hierarchical levels. 9. The system of claim 1 , wherein the output result matrix is stored in a register array and transferred to a memory. 10. The system of claim 1 , wherein the plurality of intermediate results is two intermediate results. 11. The system of claim 1 , wherein the plurality of intermediate results are outputs of adders in the next-to-last level of adders in the hierarchical tree of adders. 12. The system of claim 1 , wherein the matrix multiplication hardware unit further comprises one or more multiplexers. 13. The system of claim 1 , wherein an output of the adder in the last level of adders in the hierarchical tree of adders is an input to a multiplexer having another input that is another output of the demultiplexer. 14. The system of claim 1 , wherein the control unit is configured to transfer the elements of the plurality of different operand matrices from a memory to registers. 15. The system of claim 1 , wherein the control unit is configured to transfer the elements of the plurality of different operand matrices from registers to the matrix multiplication hardware unit. 16. The system of claim 1 , wherein the control unit is configured to send one or more control signals to one or more multiplexers and demultiplexers in the matrix multiplication hardware unit. 17. A method, comprising: element-wise multiplying a first group of elements with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit configured to perform matrix multiplications of a first size; adding together results of the plurality of multipliers using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and selectively providing a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders for use in determining an output result matrix, wherein the hierarchical tree of adders includes a demultiplexer configured to receive an output of an adder in a next-to-last level of adders in the hierarchical tree of adders, wherein the demultiplexer is configured to provide an output that is an input to an adder in a last level of adders in the hierarchical tree of adders; and using a control unit to instruct the matrix multiplication hardware unit to perform a plurality of different matrix multiplications of a second size smaller than the first size in parallel by using a combined matrix that includes elements of a plurality of different operand matrices and utilize one or more selected ones of the intermediate results of the hierarchical tree of adders for use in determining the output result matrix that includes different groups of elements representing different multiplication results corresponding to different ones of the different operand matrices. 18. The method of claim 17 , wherein the first group of elements is associated with convolution filter values and the second group of elements is associated with input values to be convolved with the convolution filter values, or vice versa. 19. The method of claim 17 , wherein the first group of elements and the second group of elements include color channel data. 20. The method of claim 17 , wherein the first group of elements and the second group of elements are loaded from a memory and stored in registers.

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • with column wise addition of partial products, e.g. using Wallace tree, Dadda counters (G06F7/5324 takes precedence) · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11520854B2 cover?
A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of …
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).