What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Distributed convolution for neural networks

US11748625B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11748625-B2
Application number	US-201615395675-A
Country	US
Kind code	B2
Filing date	Dec 30, 2016
Priority date	Dec 30, 2016
Publication date	Sep 5, 2023
Grant date	Sep 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus, comprising: interface circuitry; a matrix processing cluster (MPC) circuitry, communicatively coupled to the interface circuitry, the MPC circuitry including: memory resource block circuitry to store a plurality of input matrices; a plurality of matrix processing units (MPUs), wherein each MPU includes processing circuitry to perform matrix arithmetic; master control central processing unit (MCC) circuitry to distribute a matrix instruction, received from a controller via the interface circuitry, across the plurality of matrix processing units (MPUs), wherein the matrix instruction is to perform a neural network operation on the plurality of input matrices, wherein the neural network operation includes a plurality of convolution operations; slicing engine circuitry to partition the plurality of input matrices into a plurality of input partitions based on a number of available MPUs; the MCC circuitry to distribute the plurality of input partitions among the plurality of MPUs, wherein each input partition is distributed to a particular MPU of the plurality of MPUs, wherein the MCC circuitry to shift each input partition to a different MPU of the plurality of MPUs between each of a plurality of stages of the matrix operation; and at least two or more of the plurality of MPUs to perform a plurality of partial matrix operations in the plurality of stages including at least a first partial matrix operation in a first stage by a first MPU using a first input partition and a second partial matrix operation in the first stage by a second MPU using a second input partition, and including at least a third partial matrix operation in a stage subsequent to the first stage by the first MPU using the second input partition and a fourth partial matrix operation in a stage subsequent to the first stage by the second MPU using the first input partition, wherein the first and second input partitions are shifted between at least the first and second MPUs during one or more weight update operations; and the controller to determine a result of the neural network operation based on the plurality of partial matrix operations. 2. The apparatus of claim 1 , wherein the plurality of input matrices includes matrix data associated with one or more images and one or more filters, wherein the one or more images are associated with one or more channels. 3. The apparatus of claim 2 , wherein the slicing engine circuitry to partition the plurality of input matrices into the plurality of input partitions based on the number of available MPUs is further to partition the plurality of input matrices based on one or more of: a number of channels associated with the one or more images; a number of filters; and a number of images. 4. The apparatus of claim 1 , wherein the MCC circuitry is further to distribute the plurality of partial matrix operations among the plurality of MPUs based on a height and a width of the result of the neural network operation. 5. The apparatus of claim 1 , wherein: the plurality of MPUs is configured in a cyclic arrangement such that each MPU is communicatively coupled to a plurality of neighbor MPUs; the MCC circuitry to transmit, via the interface circuitry, the partial matrix data between the plurality of MPUs while performing the plurality of partial matrix operations is further to transmit a portion of the partial matrix data from each MPU to one or more of the neighbor MPUs while performing a particular stage of the partial matrix operations. 6. The apparatus of claim 5 , wherein the neural network operation is associated with the one or more weight update operations in a neural network. 7. The apparatus of claim 5 , wherein the partial matrix data includes a partial result matrix determined by a first MPU in a particular stage of the partial matrix operations, and wherein the partial result matrix is to be used by a second MPU in a subsequent stage of the partial matrix operations. 8. The apparatus of claim 7 , wherein the neural network operation is associated with a forward propagation operation in a neural network. 9. The apparatus of claim 7 , wherein the neural network operation is associated with a backward propagation operation in a neural network. 10. A method of performing a neural network operation on a matrix processor, comprising: distribute a matrix instruction to perform the neural network operation on a plurality of input matrices, wherein the neural network operation includes a plurality of convolution operations; partitioning the plurality of input matrices into a plurality of input partitions based on a number of available matrix processing units (MPUs) in the matrix processor; distributing the plurality of input partitions among a plurality of MPUs in the matrix processor, wherein each input partition is distributed to a particular MPU of the plurality of MPUs; shifting each input partition to a different MPU of the plurality of MPUs between each of a plurality of stages of the matrix operation; and performing a plurality of partial matrix operations in a plurality of stages, including at least a first partial matrix operation in a first stage by a first MPU using a first input partition and a second partial matrix operation in the first stage by a second MPU using a second input partition, and including at least a third partial matrix operation in a stage subsequent to the first stage by the first MPU using the second input partition and a fourth partial matrix operation in a stage subsequent to the first stage by the second MPU using the first input partition, wherein the first and second input partitions are shifted between at least the first and second MPUs during one or more weight update operations; and determining a result of the neural network operation based on the plurality of partial matrix operations. 11. The method of claim 10 , wherein: the plurality of input matrices includes matrix data associated with one or more images and one or more filters, wherein the one or more images are associated with one or more channels; and the plurality of input matrices is further partitioned based on one or more of: a number of channels associated with the one or more images; a number of filters; and a number of images. 12. The method of claim 10 , further including distributing the plurality of partial matrix operations to the plurality of MPUs based on a height and a width of the result of the neural network operation. 13. The method of claim 10 , wherein the plurality of MPUs is configured in a cyclic arrangement such that each MPU is communicatively coupled to a plurality of neighbor MPUs. 14. The method of claim 13 , wherein each MPU transmits a portion of the partial matrix data to one or more of the neighbor MPUs while performing a particular stage of the partial matrix operations. 15. A system, comprising: memory circuitry to store a plurality of input matrices; a plurality of matrix processing chips, wherein each matrix processing chip includes a plurality of matrix processing cluster (MPC) circuitries, the plurality of MPC circuitries to each include a plurality of matrix processing units (MPUs) to perform matrix arithmetic; interface circuitry to communicatively couple the plurality of matrix processing chips; and host processor circuitry to instruct at least one of the plurality of matrix processing chips to perform a neural network operation on the plurality of input matrices, wherein the neural network operation includes a plurality of convolution operations; the at least one of the plurality of matrix processing chips t

Assignees

Intel Corp

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06F17/153
Multidimensional correlation or convolution · CPC title
G06F17/16
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 60937528

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11748625B2 cover?: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input pa…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).