What technology area does this patent fall under?

Primary CPC classification G06F9/5066. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 23 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Point to point connected processing elements with data joiner components

US11657252B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11657252-B2
Application number	US-201916434960-A
Country	US
Kind code	B2
Filing date	Jun 7, 2019
Priority date	Jun 7, 2019
Publication date	May 23, 2023
Grant date	May 23, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A microprocessor system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A microprocessor system, comprising: a first processing element including a first matrix computing unit; a second processing element including a second matrix computing unit; a point-to-point connection between the first processing element and the second processing element, wherein the point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit, wherein the data joiner component includes an adder and a multiplexer and the multiplexer is configured to receive the result of the second matrix computing unit; and a communication bus connecting together at least the first processing element and the second processing element. 2. The system of claim 1 , wherein the multiplexer is configured to shift the result of the second matrix computing unit by a configured result offset. 3. The system of claim 2 , wherein the configured result offset is a 0-byte, 8-byte, 16-byte, or 24-byte offset. 4. The system of claim 2 , wherein the configured result offset is specified by a processing element instruction. 5. The system of claim 4 , wherein the processing element instruction is a convolution operation instruction. 6. The system of claim 4 , wherein the second processing element is configured to receive the processing element instruction via the communication bus. 7. The system of claim 2 , wherein the adder is configured to receive the result of the first processing element and the shifted result of the second matrix computing unit. 8. The system of claim 7 , wherein the adder is configured to add together the result of the first processing element and the shifted result of the second matrix computing unit to output a packed result. 9. The system of claim 8 , wherein the packed result is a size of a cache-line. 10. The system of claim 8 , further comprising a second point-to-point connection configured to send the packed result to a third processing element, and wherein the second point-to-point connection connects the second matrix computing unit to the third processing element. 11. The system of claim 10 , wherein the third processing element includes a second data joiner component and the second data joiner component is connected to the second point-to-point connection. 12. The system of claim 8 , wherein the packed result includes a plurality of matrix compute results, and each matrix compute result of the plurality of matrix compute results is determined using a different processing element. 13. The system of claim 1 , wherein the microprocessor system is included in an integrated circuit chip. 14. A method, comprising: determining a processing result using a first processing element, wherein the first processing element includes a first matrix computing unit; providing the processing result of the first processing element to a data joiner component of a second matrix computing unit via a first point-to-point connection; determining a result of the second matrix computing unit; providing the result of the second matrix computing unit to the data joiner component of the second matrix computing unit, wherein the data joiner component includes an adder and a multiplexer and the multiplexer is configured to receive the result of the second matrix computing unit; joining the processing result of the first processing element and the result of the second matrix computing unit to create a packed result; and sending the packed result to a third processing element via a second point-to-point connection. 15. The method of claim 14 , wherein the multiplexer is configured to shift the result of the second matrix computing unit by a configured result offset. 16. The method of claim 15 , wherein the configured result offset is specified by a processing element instruction. 17. The method of claim 16 , wherein the processing element instruction includes a convolution operation instruction. 18. The method of claim 14 , wherein the processing result of the first matrix computing unit and the result of the second matrix computing unit are byte-aligned in the packed result. 19. The method of claim 14 , wherein the packed result includes a plurality of matrix compute results, and each matrix compute result of the plurality of matrix compute results is determined using a different processing element. 20. A microprocessor system, comprising: a first processing element including a first matrix computing unit and a first data joiner component; a second processing element including a second matrix computing unit and a second data joiner component; a third processing element including a third matrix computing unit and a third data joiner component; a first point-to-point connection between the first data joiner component of the first processing element and the second data joiner component of the second processing element, wherein the first point-to-point connection is configured to provide at least a first output result of the first data joiner component to the second data joiner component, and wherein the second data joiner component is configured to output a second output result by combining at least the first output result with a matrix compute result of the second matrix computing unit; a second point-to-point connection between the second data joiner component of the second processing element and the third data joiner component of the third processing element, wherein the second point-to-point connection is configured to provide at least the second output result of the second data joiner component to the third data joiner component; and a communication bus connecting together at least the first processing element, the second processing element, and the third processing element.

Assignees

Meta Platforms Inc

Inventors

Classifications

G06F9/3895
for complex operations, e.g. multidimensional or interleaved address generators, macros · CPC title
G06F17/16
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
H04L67/10
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
G06F9/5066Primary
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
G06N3/02Primary
Neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 71094649

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11657252B2 cover?: A microprocessor system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second proces…
Who is the assignee on this patent?: Meta Platforms Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/5066. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 23 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and device for matrix multiplication optimization using vector registers

Method and apparatus for recognizing sign language or gesture using 3d edm

Machine learning accelerator architecture

Systems and methods for performing matrix compress and decompress instructions

Vector computational unit

Online convolutional dictionary learning

Rotating data for neural network computations

Frequently asked questions