What technology area does this patent fall under?

Primary CPC classification G06F9/5016. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

High bandwidth memory system with dynamically programmable distribution scheme

US11663043B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11663043-B2
Application number	US-201916701019-A
Country	US
Kind code	B2
Filing date	Dec 2, 2019
Priority date	Dec 2, 2019
Publication date	May 30, 2023
Grant date	May 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access data from the plurality of memory units using a dynamically programmable distribution scheme.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a plurality of memory units, wherein each of the plurality of memory units includes a request processing unit and a plurality of memory banks, wherein the request processing unit of each of the plurality of memory units is configured to decompose a received broadcasted memory request into a corresponding plurality of partial requests and the request processing unit of each of the plurality of memory units is configured to provide a partial response associated with a different one of the corresponding plurality of partial requests; and a processor coupled to the plurality of memory units, wherein the processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units, and wherein at least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine, the control logic unit is configured to access data from the plurality of memory units using a dynamically programmable distribution scheme; wherein the dynamically programmable distribution scheme specifies a parameter of a configurable distribution pattern for dynamically changing mapping scheme of memory addresses specific to a corresponding one of the processing elements to ordering among memory locations of the plurality of memory units, and the dynamically programmable distribution scheme is included in a plurality of different dynamically programmable distribution schemes utilized by different processing elements of the plurality of processing elements that allow different workloads to distribute their corresponding workload data across the plurality of memory units using different corresponding distribution schemes of different corresponding mapping schemes included in the plurality of different dynamically programmable distribution schemes desired for corresponding workloads of the different workloads. 2. The system of claim 1 , wherein the broadcasted memory request references data stored in each of the plurality of memory units. 3. The system of claim 1 , wherein the request processing unit of each of the plurality of memory units is configured to determine whether each of the corresponding plurality of partial requests corresponds to data stored in a corresponding one of the plurality of memory banks associated with the corresponding request processing unit. 4. The system of claim 1 , wherein the control logic unit of the first processing element is configured to receive the partial responses and combine the partial responses to generate a complete response to the broadcasted memory request. 5. The system of claim 4 , wherein each of the partial responses includes a corresponding sequence identifier used to order the partial responses. 6. The system of claim 4 , wherein the complete response is stored in a local memory of the first processing element. 7. The system of claim 1 , wherein the plurality of memory units includes a north memory unit, an east memory unit, a south memory unit, and a west memory unit. 8. The system of claim 1 , wherein the dynamically programmable distribution scheme utilizes an identifier associated with a workload of the first processing element. 9. The system of claim 8 , wherein two or more processing elements of the plurality of processing elements share the identifier. 10. The system of claim 1 , wherein a second processing element of the plurality of processing elements is configured with a different dynamically programmable distribution scheme for accessing memory units than the first processing element. 11. The system of claim 1 , wherein the control logic unit of the first processing element is further configured with an access unit size for distributing data across the plurality of memory units. 12. The system of claim 1 , wherein data elements of a machine learning weight matrix are distributed across the plurality of memory units using the dynamically programmable distribution scheme. 13. A method comprising: receiving a memory configuration setting associated with a workload, wherein the workload is associated with a dynamically programmable distribution scheme; creating a memory access request that includes a workload identifier; broadcasting the memory access request to a plurality of memory units, wherein the request processing unit of each of the plurality of memory units is configured to decompose a received broadcasted memory request into a corresponding plurality of partial requests and the request processing unit of each of the plurality of memory units is configured to provide a partial response associated with a different one of the corresponding plurality of partial requests; receiving a plurality of partial responses associated with the memory access request; and combining the plurality of partial responses to create a complete response to the memory access request; wherein dynamically programmable distribution scheme specifies a parameter of a configurable distribution pattern for dynamically changing mapping scheme of memory addresses specific to a corresponding one processing element among a plurality of processing elements to ordering among memory locations of the plurality of memory units, and the dynamically programmable distribution scheme is included in a plurality of different dynamically programmable distribution schemes utilized by different processing elements of the plurality of processing elements that allow different workloads to distribute their corresponding workload data across the plurality of memory units using different corresponding distribution schemes of different corresponding mapping schemes included in the plurality of different dynamically programmable distribution schemes desired for corresponding workloads of the different workloads. 14. The method of claim 13 , further comprising receiving an access unit size configuration setting. 15. The method of claim 14 , wherein the memory access request has a memory request size that is a multiple of the access unit size configuration setting. 16. A method comprising: receiving a broadcasted memory request associated with a processing element workload wherein the processing element workload is associated with a dynamically programmable distribution scheme; decomposing the broadcasted memory request into a plurality of partial requests; determining for each of the plurality of partial requests whether the partial request is to be served from an associated memory bank of a plurality of memory units; discarding a first group of partial requests that is not to be served from the associated memory bank; for each partial request of a second group of partial requests that is to be served from the associated memory bank, retrieving data of the partial request; preparing one or more partial responses using the retrieved data; and providing the prepared one or more partial responses; wherein dynamically programmable distribution scheme specifies a parameter of a configurable distribution pattern for dynamically changing mapping scheme of memory addresses specific to a corresponding one processing element among a plurality of processing elements to ordering among memory locations of the plurality of memory units, and the dynamically programmable distribution scheme is included in a plurality of different dynamically programmable distribution schemes utilized by different processing elements of the plurality of processing elements that allow different workloads to distribute their corresponding workload data acro

Assignees

Meta Platforms Inc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F12/0246
in block erasable memory, e.g. flash memory · CPC title
G06F13/126
and has means for transferring I/O instructions and statuses between control unit and main processor · CPC title
G06F2212/1016
Performance improvement · CPC title
G06F9/5016Primary
the resource being the memory · CPC title

Patent family

Related publications grouped by family.

View patent family 73059553

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11663043B2 cover?: A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element …
Who is the assignee on this patent?: Meta Platforms Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/5016. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Deep learning hardware

Access processor

Data distribution among multiple managed memories

Frequently asked questions