High bandwidth memory system with dynamically programmable distribution scheme

US11663043B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11663043-B2
Application numberUS-201916701019-A
CountryUS
Kind codeB2
Filing dateDec 2, 2019
Priority dateDec 2, 2019
Publication dateMay 30, 2023
Grant dateMay 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access data from the plurality of memory units using a dynamically programmable distribution scheme.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a plurality of memory units, wherein each of the plurality of memory units includes a request processing unit and a plurality of memory banks, wherein the request processing unit of each of the plurality of memory units is configured to decompose a received broadcasted memory request into a corresponding plurality of partial requests and the request processing unit of each of the plurality of memory units is configured to provide a partial response associated with a different one of the corresponding plurality of partial requests; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine, the control logic unit is configured to access data from the plurality of memory units using a dynamically programmable distribution scheme; wherein the dynamically programmable distribution scheme specifies a parameter of a configurable distribution pattern for dynamically changing mapping scheme of memory addresses specific to a corresponding one of the processing elements to ordering among memory locations of the plurality of memory units, and the dynamically programmable distribution scheme is included in a plurality of different dynamically programmable distribution schemes utilized by different processing elements of the plurality of processing elements that allow different workloads to distribute their corresponding workload data across the plurality of memory units using different corresponding distribution schemes of different corresponding mapping schemes included in the plurality of different dynamically programmable distribution schemes desired for corresponding workloads of the different workloads. 2. The system of claim 1 , wherein the broadcasted memory request references data stored in each of the plurality of memory units. 3. The system of claim 1 , wherein the request processing unit of each of the plurality of memory units is configured to determine whether each of the corresponding plurality of partial requests corresponds to data stored in a corresponding one of the plurality of memory banks associated with the corresponding request processing unit. 4. The system of claim 1 , wherein the control logic unit of the first processing element is configured to receive the partial responses and combine the partial responses to generate a complete response to the broadcasted memory request. 5. The system of claim 4 , wherein each of the partial responses includes a corresponding sequence identifier used to order the partial responses. 6. The system of claim 4 , wherein the complete response is stored in a local memory of the first processing element. 7. The system of claim 1 , wherein the plurality of memory units includes a north memory unit, an east memory unit, a south memory unit, and a west memory unit. 8. The system of claim 1 , wherein the dynamically programmable distribution scheme utilizes an identifier associated with a workload of the first processing element. 9. The system of claim 8 , wherein two or more processing elements of the plurality of processing elements share the identifier. 10. The system of claim 1 , wherein a second processing element of the plurality of processing elements is configured with a different dynamically programmable distribution scheme for accessing memory units than the first processing element. 11. The system of claim 1 , wherein the control logic unit of the first processing element is further configured with an access unit size for distributing data across the plurality of memory units. 12. The system of claim 1 , wherein data elements of a machine learning weight matrix are distributed across the plurality of memory units using the dynamically programmable distribution scheme. 13. A method comprising: receiving a memory configuration setting associated with a workload, wherein the workload is associated with a dynamically programmable distribution scheme; creating a memory access request that includes a workload identifier; broadcasting the memory access request to a plurality of memory units, wherein the request processing unit of each of the plurality of memory units is configured to decompose a received broadcasted memory request into a corresponding plurality of partial requests and the request processing unit of each of the plurality of memory units is configured to provide a partial response associated with a different one of the corresponding plurality of partial requests; receiving a plurality of partial responses associated with the memory access request; and combining the plurality of partial responses to create a complete response to the memory access request; wherein dynamically programmable distribution scheme specifies a parameter of a configurable distribution pattern for dynamically changing mapping scheme of memory addresses specific to a corresponding one processing element among a plurality of processing elements to ordering among memory locations of the plurality of memory units, and the dynamically programmable distribution scheme is included in a plurality of different dynamically programmable distribution schemes utilized by different processing elements of the plurality of processing elements that allow different workloads to distribute their corresponding workload data across the plurality of memory units using different corresponding distribution schemes of different corresponding mapping schemes included in the plurality of different dynamically programmable distribution schemes desired for corresponding workloads of the different workloads. 14. The method of claim 13 , further comprising receiving an access unit size configuration setting. 15. The method of claim 14 , wherein the memory access request has a memory request size that is a multiple of the access unit size configuration setting. 16. A method comprising: receiving a broadcasted memory request associated with a processing element workload wherein the processing element workload is associated with a dynamically programmable distribution scheme; decomposing the broadcasted memory request into a plurality of partial requests; determining for each of the plurality of partial requests whether the partial request is to be served from an associated memory bank of a plurality of memory units; discarding a first group of partial requests that is not to be served from the associated memory bank; for each partial request of a second group of partial requests that is to be served from the associated memory bank, retrieving data of the partial request; preparing one or more partial responses using the retrieved data; and providing the prepared one or more partial responses; wherein dynamically programmable distribution scheme specifies a parameter of a configurable distribution pattern for dynamically changing mapping scheme of memory addresses specific to a corresponding one processing element among a plurality of processing elements to ordering among memory locations of the plurality of memory units, and the dynamically programmable distribution scheme is included in a plurality of different dynamically programmable distribution schemes utilized by different processing elements of the plurality of processing elements that allow different workloads to distribute their corresponding workload data acro

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • in block erasable memory, e.g. flash memory · CPC title

  • and has means for transferring I/O instructions and statuses between control unit and main processor · CPC title

  • Performance improvement · CPC title

  • G06F9/5016Primary

    the resource being the memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11663043B2 cover?
A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element …
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/5016. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).