Scheduling memory requests for a ganged memory device
US-2019196721-A1 · Jun 27, 2019 · US
US11537301B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11537301-B2 |
| Application number | US-202117307828-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 4, 2021 |
| Priority date | Dec 12, 2019 |
| Publication date | Dec 27, 2022 |
| Grant date | Dec 27, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a plurality of memory units; and a processor coupled to each of the plurality of memory units by a plurality of network connections, wherein the processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array, and wherein each processing element of the plurality of processing elements located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements of the plurality of processing elements located along a same axis of the two-dimensional array. 2. The system of claim 1 , wherein each processing element of the plurality of processing elements includes a matrix compute engine, a network interface, and a control logic. 3. The system of claim 2 , wherein the control logic is configured to provide a memory request to the request broadcasting master for the respective group of processing elements and to access data from the plurality of memory units using a dynamically programmable distribution scheme. 4. The system of claim 1 , wherein the request broadcasting master for the respective group of processing elements is configured to receive a plurality of memory requests from the plurality of processing elements of the respective group. 5. The system of claim 4 , wherein the request broadcasting master is configured to merge the plurality of memory requests into a compressed memory request. 6. The system of claim 5 , wherein the request broadcasting master is configured to broadcast the compressed memory request to the plurality of memory units. 7. The system of claim 6 , wherein the request broadcasting master is configured to receive partial memory responses in response to the broadcasted compressed memory request from the plurality of memory units. 8. The system of claim 6 , wherein the broadcasted compressed memory request references data stored in each of the plurality of memory units. 9. The system of claim 6 , wherein each of the plurality of memory units is configured to decompose the broadcasted compressed memory request into a corresponding plurality of partial requests. 10. The system of claim 9 , wherein each of the plurality of memory units is configured to determine whether each of the corresponding plurality of partial requests corresponds to data stored in a corresponding one of a plurality of memory banks associated with the corresponding memory unit. 11. The system of claim 10 , wherein each of the plurality of memory units is configured to provide a partial response associated with a different one of the corresponding plurality of partial requests. 12. The system of claim 11 , wherein the partial response includes a corresponding sequence identifier that orders the partial response among a plurality of partial responses. 13. The system of claim 6 , wherein the each request broadcasting master is configured to receive partial responses, combine the partial responses to generate a complete response to the broadcasted compressed memory request, and provide the complete response to a processing element of the respective group of processing elements. 14. The system of claim 6 , wherein the each request broadcasting master is configured to receive partial responses, match each of the partial responses to a processing element of the respective group of processing elements, and forward each of the matched partial responses to the corresponding matched processing element. 15. The system of claim 1 , wherein the each request broadcasting master located along the diagonal of the two-dimensional array is configured to provide memory requests to and receive responses from the plurality of memory units using a different network connection of the plurality of network connections. 16. The system of claim 1 , wherein the plurality of memory units includes a north memory unit, an east memory unit, a south memory unit, and a west memory unit. 17. A method comprising: receiving a first memory request associated with a first processing element of a first processing element group of a plurality of processing element groups, wherein each processing element group of the plurality of processing element groups is located on a different row of a two-dimensional array of processing elements; receiving a second memory request associated with a second processing element of the first processing element group; merging the first memory request and the second memory request into a compressed memory request; broadcasting the compressed memory request to a plurality of memory units; and receiving from the plurality of memory units a plurality of partial responses associated with the compressed memory request. 18. The method of claim 17 , further comprising: combining the plurality of partial responses to create a first complete response to the first memory request and a second complete response to the second memory request; providing the first complete response to the first processing element; and providing the second complete response to the second processing element. 19. The method of claim 17 , further comprising: matching a first set of partial responses of the plurality of partial responses with the first memory request; matching a second set of partial responses of the plurality of partial responses with the second memory request; providing the first set of partial responses to the first processing element; and providing the second set of partial responses to the second processing element. 20. A system, comprising: a plurality of memory units, wherein at least one of the plurality of memory units is configured to decompose a broadcasted compressed memory request into a corresponding plurality of partial requests; and a processor coupled to each of the plurality of memory units by a plurality of network connections, wherein the processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array, and wherein each processing element of the plurality of processing elements located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements of the plurality of processing elements located along a same axis of the two-dimensional array.
using electronic means · CPC title
Improving or facilitating administration, e.g. storage management · CPC title
Two dimensional arrays, e.g. mesh, torus · CPC title
Combinations of networks · CPC title
Two dimensional, e.g. mesh, torus · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.