High bandwidth memory system with distributed request broadcasting masters

US11537301B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11537301-B2
Application numberUS-202117307828-A
CountryUS
Kind codeB2
Filing dateMay 4, 2021
Priority dateDec 12, 2019
Publication dateDec 27, 2022
Grant dateDec 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a plurality of memory units; and a processor coupled to each of the plurality of memory units by a plurality of network connections, wherein the processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array, and wherein each processing element of the plurality of processing elements located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements of the plurality of processing elements located along a same axis of the two-dimensional array. 2. The system of claim 1 , wherein each processing element of the plurality of processing elements includes a matrix compute engine, a network interface, and a control logic. 3. The system of claim 2 , wherein the control logic is configured to provide a memory request to the request broadcasting master for the respective group of processing elements and to access data from the plurality of memory units using a dynamically programmable distribution scheme. 4. The system of claim 1 , wherein the request broadcasting master for the respective group of processing elements is configured to receive a plurality of memory requests from the plurality of processing elements of the respective group. 5. The system of claim 4 , wherein the request broadcasting master is configured to merge the plurality of memory requests into a compressed memory request. 6. The system of claim 5 , wherein the request broadcasting master is configured to broadcast the compressed memory request to the plurality of memory units. 7. The system of claim 6 , wherein the request broadcasting master is configured to receive partial memory responses in response to the broadcasted compressed memory request from the plurality of memory units. 8. The system of claim 6 , wherein the broadcasted compressed memory request references data stored in each of the plurality of memory units. 9. The system of claim 6 , wherein each of the plurality of memory units is configured to decompose the broadcasted compressed memory request into a corresponding plurality of partial requests. 10. The system of claim 9 , wherein each of the plurality of memory units is configured to determine whether each of the corresponding plurality of partial requests corresponds to data stored in a corresponding one of a plurality of memory banks associated with the corresponding memory unit. 11. The system of claim 10 , wherein each of the plurality of memory units is configured to provide a partial response associated with a different one of the corresponding plurality of partial requests. 12. The system of claim 11 , wherein the partial response includes a corresponding sequence identifier that orders the partial response among a plurality of partial responses. 13. The system of claim 6 , wherein the each request broadcasting master is configured to receive partial responses, combine the partial responses to generate a complete response to the broadcasted compressed memory request, and provide the complete response to a processing element of the respective group of processing elements. 14. The system of claim 6 , wherein the each request broadcasting master is configured to receive partial responses, match each of the partial responses to a processing element of the respective group of processing elements, and forward each of the matched partial responses to the corresponding matched processing element. 15. The system of claim 1 , wherein the each request broadcasting master located along the diagonal of the two-dimensional array is configured to provide memory requests to and receive responses from the plurality of memory units using a different network connection of the plurality of network connections. 16. The system of claim 1 , wherein the plurality of memory units includes a north memory unit, an east memory unit, a south memory unit, and a west memory unit. 17. A method comprising: receiving a first memory request associated with a first processing element of a first processing element group of a plurality of processing element groups, wherein each processing element group of the plurality of processing element groups is located on a different row of a two-dimensional array of processing elements; receiving a second memory request associated with a second processing element of the first processing element group; merging the first memory request and the second memory request into a compressed memory request; broadcasting the compressed memory request to a plurality of memory units; and receiving from the plurality of memory units a plurality of partial responses associated with the compressed memory request. 18. The method of claim 17 , further comprising: combining the plurality of partial responses to create a first complete response to the first memory request and a second complete response to the second memory request; providing the first complete response to the first processing element; and providing the second complete response to the second processing element. 19. The method of claim 17 , further comprising: matching a first set of partial responses of the plurality of partial responses with the first memory request; matching a second set of partial responses of the plurality of partial responses with the second memory request; providing the first set of partial responses to the first processing element; and providing the second set of partial responses to the second processing element. 20. A system, comprising: a plurality of memory units, wherein at least one of the plurality of memory units is configured to decompose a broadcasted compressed memory request into a corresponding plurality of partial requests; and a processor coupled to each of the plurality of memory units by a plurality of network connections, wherein the processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array, and wherein each processing element of the plurality of processing elements located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements of the plurality of processing elements located along a same axis of the two-dimensional array.

Assignees

Inventors

Classifications

  • using electronic means · CPC title

  • Improving or facilitating administration, e.g. storage management · CPC title

  • Two dimensional arrays, e.g. mesh, torus · CPC title

  • Combinations of networks · CPC title

  • Two dimensional, e.g. mesh, torus · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11537301B2 cover?
A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to o…
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0631. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).