Low latency matrix multiply unit

US10970362B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10970362-B2
Application numberUS-202016915286-A
CountryUS
Kind codeB2
Filing dateJun 29, 2020
Priority dateMay 17, 2017
Publication dateApr 6, 2021
Grant dateApr 6, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

First claim

Opening claim text (preview).

What is claimed is: 1. A cell of a plurality of cells arranged in an array of a matrix multiply unit, the cell comprising: a weight matrix register configured to receive a weight input of a neural network from one or more weight storing registers, wherein the one or more weight storing registers are configured to receive a plurality of weight inputs of the neural network from a first direction of the array and a second direction of the array, the second direction being different from the first direction; and a multiply unit that is coupled to the weight matrix register, wherein the multiply unit is configured to receive the weight input from the weight matrix register, wherein the multiply unit is configured to multiply the received weight input with a vector data input of the neural network to generate a multiplication result. 2. The cell of claim 1 , wherein the multiplication result is a part of a plurality of neural network computations. 3. The cell of claim 2 , wherein the multiply unit is configured to multiply the received weight input with another vector data input to generate another multiplication result, wherein the other multiplication result is another part of the plurality of neural network computations. 4. The cell of claim 1 , wherein the array is a systolic array of cells. 5. The cell of claim 1 , wherein: the array has a two-dimensional format; the first direction of the array is the first direction in the two-dimensional format; and the second direction of the array is the second direction in the two-dimensional format. 6. The cell of claim 1 , further comprising a multiplexer configured to: select the weight input from the plurality of weight inputs; and send the selected weight input to the weight matrix register. 7. The cell of claim 1 , wherein the one or more weight storing registers comprise a transposed weight shift register and a non-transposed weight shift register. 8. The cell of claim 7 , wherein the non-transposed weight shift register is physically separate from the transposed weight shift register. 9. The cell of claim 1 , wherein the one or more weight storing registers comprise: a first weight storing register configured to receive a first weight input of the plurality of weight inputs over a first wired path from a first cell of the plurality of cells that is along the first direction; and a second weight storing register configured to receive a second weight input over a second wired path from a second cell of the plurality of cells that is along the second direction, wherein the weight input is one of the first weight input and the second weight input. 10. A method comprising: receiving, by a weight matrix register of a cell of a plurality of cells arranged in an array of a matrix multiply unit, a weight input of a neural network from one or more weight storing registers, wherein the one or more weight storing registers receive a plurality of weight inputs of the neural network from a first direction of the array and a second direction of the array, wherein the second direction is different from the first direction; and multiplying, by a multiply unit that is coupled to the weight matrix register, the weight input with a vector data input of the neural network to generate a multiplication result. 11. The method of claim 10 , wherein the multiplication result is a part of a plurality of neural network computations. 12. The method of claim 11 , further comprising: multiplying, by the multiply unit, the received weight input with another vector data input to generate another multiplication result, wherein the other multiplication result is another part of the plurality of neural network computations. 13. The method of claim 10 , wherein the array is a systolic array of cells. 14. The method of claim 10 , wherein: the array has a two-dimensional format; the first direction of the array is the first direction in the two-dimensional format; and the second direction of the array is the second direction in the two-dimensional format. 15. The method of claim 10 , further comprising: selecting, by a multiplexer, the weight input from the plurality of weight inputs; and receiving, by the weight matrix register, the selected weight input from the multiplexer. 16. The method of claim 10 , wherein the one or more weight storing registers comprise a transposed weight shift register and a non-transposed weight shift register. 17. The method of claim 16 , wherein the non-transposed weight shift register is physically separate from the transposed weight shift register. 18. The method of claim 10 , wherein the one or more weight storing registers comprise: a first weight storing register configured to receive a first weight input of the plurality of weight inputs over a first wired path from a first cell of the plurality of cells that is along the first direction; and a second weight storing register configured to receive a second weight input over a second wired path from a second cell of the plurality of cells that is along the second direction, wherein the weight input is one of the first weight input and the second weight input. 19. A non-transitory computer program product storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising: receiving, by a weight matrix register of a cell of a plurality of cells arranged in an array of a matrix multiply unit, a weight input of a neural network from one or more weight storing registers, wherein the one or more weight storing registers receive a plurality of weight inputs of the neural network from a first direction of the array and a second direction of the array, wherein the second direction is different from the first direction; and multiplying, by a multiply unit that is coupled to the weight matrix register, the weight input with a vector data input of the neural network to generate a multiplication result. 20. The non-transitory computer program product of claim 19 , wherein: the array has a two-dimensional format; the first direction of the array is the first direction in the two-dimensional format; and the second direction of the array is the second direction in the two-dimensional format.

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using electronic means · CPC title

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10970362B2 cover?
Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be sto…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F7/523. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).