Technologies for dividing work across accelerator devices
US-2024143410-A1 · May 2, 2024 · US
US9798550B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9798550-B2 |
| Application number | US-201313737290-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 9, 2013 |
| Priority date | Jan 9, 2013 |
| Publication date | Oct 24, 2017 |
| Grant date | Oct 24, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and device for memory access in processors is provided. A processor, comprising a plurality of computational units, is capable of executing a single instruction on multiple pieces of data simultaneously (SIMD). A read operation is initiated to load data from memory into the plurality of computational units (CUs) arranged into a plurality of CU groups. The memory is arranged into a plurality of memory macro-blocks each associated with a respective CU group of the plurality of CU groups. For each CU group a respective first memory address is determined and for each CU group, the data in the associated memory macro-block is accessed at the respective first memory address.
Opening claim text (preview).
What is claimed is: 1. A device comprising: a vector memory space divided into a plurality of memory macro-blocks for storing data; a vector processor comprising a plurality of computational units (CUs) for executing instructions, the plurality of CUs arranged into a plurality of CU groups, each CU group comprising two or more CUs of the plurality of CUs, the plurality of CUs providing execution of a single instruction on multiple pieces of data (SIMD); and a plurality of memory macro-block access units, each of the plurality of memory macro-block access units: couples a respective CU group to a respective associated memory macro-block, is directly connected to each CU of the respective CU group, controls access of the CUs of the respective CU group to the associated memory macro-block, controls memory access based on an associated CU address input to the associated memory macro-block to retrieve data from, or place data to, per read/write cycle, and operates in a first mode when the associated address for each of the CUs in the respective CU group is a same address, and in a second mode when the associated addresses for at least two of the CUs in the respective CU group are different. 2. The device of claim 1 , wherein each of the memory macroblock access units determines the address individually for each CU in the associated CU group in subsequent cycles. 3. The device of claim 1 , wherein each of the memory macroblock access units determines the address for two or more CUs in the associated CU group in a single cycle. 4. The device of claim 1 , wherein there are z CU groups, each with m CUs, each of the CUs has an n-bit interface to the associated memory macro-block, wherein each of the memory macro-blocks can provide n×m bits of data to the associated CU group in a memory access operation, wherein the n×m bits of data for a respective CU group are addressed by a single memory macro-block address. 5. The device of claim 4 , wherein each of the memory macro-block access units controls data provided to, or received from, each of the CUs in the respective CU group based on a CU mask indicating a portion of the n×m bits of data from the associated memory macro-block the respective CU is to receive. 6. The device of claim 1 , wherein each of the memory macro-block access units can access data from neighboring memory macro-blocks during a portion of a memory access operation. 7. The device of claim 6 , wherein each of the memory macro-block access units determines the address from the respective neighboring memory macro-block for two or more CUs in the associated CU group in a single cycle. 8. The device of claim 1 , wherein each of the memory macro-block access units can access data from a plurality of neighboring memory macro-blocks during a portion of a memory access operation. 9. The device of claim 8 , wherein each memory macro-block access unit has a plurality of neighbors. 10. The device of claim 1 wherein the CU address consists of a base address plus a locally derived offset value. 11. The device of claim 1 wherein if two or more CU's have the same address then they can access the memory concurrently. 12. A method comprising: initiating a read operation for loading data from memory into a plurality of computational units (CUs) arranged into a plurality of CU groups, the memory arranged into a plurality of memory macro-blocks each associated with a respective CU group of the plurality of CU groups; for each CU group, determining a respective first memory address; and for each CU group, accessing the data in the associated memory macro-block at the respective first memory address comprising reading data from the respective memory macro-block or providing data to the respective memory macro-block, wherein accessing to each of the memory macro-blocks is controlled by a respective memory access unit which controls memory access based on an associated CU address input to the associated memory macro-block to retrieve data from, or place data to, per read/write cycle, each respective memory access unit is directly connected to each CU in a respective CU group, there are z CU groups, each with m CUs, each of the CUs has an n-bit interface to the associated memory macro-block, each of the memory macro-blocks can provide n×m bits of data to the associated CU group in a memory access operation, and the n×m bits of data for a respective CU group are addressed by a single memory macro-block address. 13. The method of claim 12 , further comprising: reading data from the respective memory macro-block to a first CU of the respective CU group or providing data to the respective memory macro-block from the first CU of the respective CU group, wherein the first memory address is associated with the first CU; for each CU group, determining a respective second memory address associated with a respective second CU in the CU group; and reading data from the respective memory macro-block to the second CU of the respective CU group or providing data to the respective memory macro-block from the second CU of the respective CU group. 14. The method of claim 13 , wherein the first and second memory addresses are individually determined for each of the first and second CUs in the associated CU group in subsequent cycles. 15. The method of claim 12 , wherein the first memory address is determined for two or more CUs in the associated CU group in a single cycle. 16. The method of claim 12 , further comprising controlling data provided to, or received from, each of the CUs in the respective CU group based on a CU mask indicating a portion of the n×m bits of data from the associated memory macro-block the respective CU is to receive. 17. The method of claim 12 , further comprising accessing data from a respective neighboring memory macro-block during a portion of a memory access operation. 18. The method of claim 17 , determining an address from the respective neighboring memory macro-block for two or more CUs in the associated CU group in a single cycle. 19. The method of claim 12 , further comprising accessing data from one of a plurality of neighboring memory macro-blocks during a portion of a memory access operation.
organised in groups of units sharing resources, e.g. clusters · CPC title
Operand accessing · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Organisation of register space, e.g. banked or distributed register file · CPC title
Details on data memory access · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.