Integer matrix multiplication engine using pipelining

US11880426B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11880426-B2
Application numberUS-202217878011-A
CountryUS
Kind codeB2
Filing dateJul 31, 2022
Priority dateApr 1, 2019
Publication dateJan 23, 2024
Grant dateJan 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor-implemented method comprising: obtaining a first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n for matrix multiplication within a processor, wherein the first integer matrix and the second integer matrix employ a two's complement variable radix point data representation; distilling the first integer matrix and the second integer matrix into (j×j) submatrices; configuring dynamically both a variable radix point format and an initial value for an accumulator register; executing multiply-accumulate (MAC) operations in a pipelined architecture on the (j×j) submatrices of the first integer matrix and the second integer matrix, wherein a third variable radix point format is configured for the result; and using the executed MAC operations to process data in a data-flow architecture. 2. The method of claim 1 further comprising configuring dynamically a first variable radix point format for the first integer matrix and a second variable radix point format for the second integer matrix. 3. The method of claim 2 wherein the first variable radix point format and the second variable radix point format comprise a 16-bit data type. 4. The method of claim 2 wherein the first variable radix point format and the second variable radix point format comprise a 4-bit primitive data type. 5. The method of claim 2 wherein the first variable radix point format and the second variable radix point format comprise an 8-bit primitive data type. 6. The method of claim 1 further comprising outputting results of the matrix multiplication to a storage element, wherein the outputting takes an additional (m×k) cycles. 7. The method of claim 1 wherein the first integer matrix and the second integer matrix comprise subsections of an o-dimensional tensor, wherein o is greater than 2. 8. The method of claim 1 wherein each multiply-accumulate (MAC) unit used for matrix multiplication in the processor is configured to have an accumulator depth of m. 9. The method of claim 1 further comprising pipelining input elements to multiply-accumulate (MAC) units used for matrix multiplication in the processor through two input registers. 10. The method of claim 1 wherein performing N multiply-accumulate (MAC) operations in parallel reduces an amount of time taken to perform the N MAC operations from an order of magnitude of N 3 to an order of magnitude of N 2 . 11. The method of claim 1 further comprising adding one or more idle or no operation (NOP) cycles after completion of a matrix multiply operation before starting a next matrix multiply operation. 12. The method of claim 1 wherein a processor and memory subsystem is allocated as part of one or more clusters within a reconfigurable fabric to implement MAC units. 13. The method of claim 12 , wherein each cluster of the one or more clusters within the reconfigurable fabric is controlled by one or more circular buffers. 14. The method of claim 13 , wherein the one or more circular buffers are statically scheduled. 15. The method of claim 12 , wherein each cluster of the one or more clusters within the reconfigurable fabric comprises process elements, switching elements, or storage elements. 16. The method of claim 1 , wherein the data-flow architecture implements machine learning. 17. The method of claim 1 , wherein the machine learning utilizes one or more convolutional neural networks. 18. The method of claim 1 , wherein executing multiply-accumulate operations in a pipelined architecture is accomplished using systolic data flow. 19. One or more non-transitory computer readable media embodying one or more instructions that are operable when executing by one or more processors to perform operations of: obtaining a first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n for matrix multiplication within a processor, wherein the first integer matrix and the second integer matrix employ a two's complement variable radix point data representation; distilling the first integer matrix and the second integer matrix into (j×j) submatrices; configuring dynamically both a variable radix point format and an initial value for an accumulator register; executing multiply-accumulate operations in a pipelined architecture—on the (j×j) submatrices of the first integer matrix and the second integer matrix, wherein a third variable radix point format is configured for the result; and using the executed MAC operations to process data in a data-flow architecture. 20. A system comprising: a memory which stores instructions; and one or more processors coupled to the memory wherein the one or more processors, when executing the instructions, are configured to: obtain a first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n for matrix multiplication within a processor, wherein the first integer matrix and the second integer matrix employ a two's complement variable radix point data representation; distill the first integer matrix and the second integer matrix into (j×j) submatrices; configure dynamically both a variable radix point format and an initial value for an accumulator register; and execute multiply-accumulate operations in a pipelined architecture on the (j×j) submatrices of the first integer matrix and the second integer matrix, wherein a third variable radix point format is configured for the result; and use the executed MAC operations to process data in a data-flow architecture.

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • G06F7/16Primary

    Combined merging and sorting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11880426B2 cover?
Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices ar…
Who is the assignee on this patent?
Wave Computing Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).