Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
US-2015268963-A1 · Sep 24, 2015 · US
US10591983B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10591983-B2 |
| Application number | US-201414212676-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 14, 2014 |
| Priority date | Mar 14, 2014 |
| Publication date | Mar 17, 2020 |
| Grant date | Mar 17, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A specialized memory access processor is placed between a main processor and accelerator hardware to handle memory access for the accelerator hardware. The architecture of the memory access processor is designed to allow lower energy memory accesses than can be obtained by the main processor in providing data to the hardware accelerator while providing the hardware accelerator with a sufficiently high bandwidth memory channel. In some embodiments, the main processor may enter a sleep state during accelerator calculations to substantially lower energy consumption.
Opening claim text (preview).
What we claim is: 1. A computer comprising: a first processor communicating with an external memory and including circuitry to provide execution of a first set of standard computer instructions and circuitry for an exchange of data with the external memory; a second processor communicating with the first processor including circuitry to provide execution of a second set of accelerator computer instructions providing the execution of functions at an accelerated rate compared to the execution of those functions on the first processor; a third processor communicating with the first processor and the second processor and including instruction storage circuitry and execution circuitry to provide for the execution of a set of memory access instructions held in the instruction storage circuitry, the third processor operating to: (1) be configured by the first processor to receive the set of memory access instructions from the first processor into the instruction storage circuitry, the set of memory access instructions including: (a) multiple event instructions defining events that will trigger actions needed for accessing memory, (b) multiple action instruction defining data transfer operations, (c) multiple initialization values operated on by the event instructions including base addresses, and (d) multiple calculations needed for computation of memory addresses; and (2) after configuration by the first processor to execute the memory access instructions in the instruction storage circuitry using the execution circuitry to exchange data between the second processor and external memory during operation of the second processor the exchange of data being according to the event instructions and action instructions describing data transfer operations operating on memory locations defined by the initialization values. 2. The computer of claim 1 wherein the circuitry of the third processor executing the set of memory access instructions provides for the exchange of data between the second processor and the external memory via the third processor using less power than required for the exchange of data between the second processor and the external memory via the first processor. 3. The computer of claim 2 wherein the first processor is an out-of-order processor speculatively executing instructions out of program order. 4. The computer of claim 3 wherein the third processor employs a trigger architecture for sequencing through the third set of memory access instructions without a program counter. 5. The computer of claim 4 wherein the set of memory access instructions includes a list of trigger events and responses, where the trigger events include an availability of data from the second processor or memory and the responses include moving data between the second processor and external memory. 6. The computer of claim 1 wherein the set of memory access instructions includes a data flow fabric configuration for calculating addresses in the external memory. 7. The computer of claim 1 wherein the first processor provides the second set of accelerator computer instructions to the second processor. 8. The computer of claim 1 wherein the computer shuts down the first processor during operation of the third processor. 9. The computer of claim 1 wherein the second processor does not include circuitry for the exchange of data with the external memory. 10. The computer of claim 1 wherein the first processor provides initial memory access data to the third processor. 11. The computer of claim 1 wherein the set of memory access instructions is limited to those needed to provide iterative calculation of memory addresses in a predictable pattern of offsets starting with an initial memory access data provided from the first processor. 12. The computer of claim 1 wherein the second processor is selected from the group consisting of an arithmetic coprocessor, a graphic coprocessor, a streaming processor, and a neural net processor. 13. The computer of claim 1 wherein the first processor sends the set of memory access instructions to the third processor based on compiler-generated instructions in a program executed by the first processor. 14. A method of executing a program using a computer having: a first processor communicating with an external memory and including circuitry to provide execution of a first set of standard computer instructions and circuitry for an exchange of data with the external memory; a second processor communicating with the first processor including circuitry to provide execution of a second set of accelerator computer instructions providing the execution of functions at an accelerated rate compared to the execution of those functions on the first processor; and a third processor communicating with the first processor and the second processor and including instruction storage circuitry and execution circuitry to provide for the execution of a set of memory access instructions held in the instruction storage circuitry, the third processor operating to receive the set of memory access instructions from the first processor, the set of memory access instructions including: (a) multiple event instructions defining events that will trigger actions needed for accessing memory, (b) multiple action instruction defining data transfer operations, (c) multiple initialization values operated on by the event instructions including base addresses, for programming of the third processor to exchange data between the second processor and external memory during operation of the second processor; a (d) multiple calculations needed for computation of memory addresses; and the method comprising the steps of: (a) executing a program by the first processor to a beginning of an acceleration region of the program where faster execution could be provided by the second processor; (b) providing multiple event instructions by the first processor defining events that will trigger actions needed for accessing memory, multiple action instruction defining data transfer operations, multiple initialization values operated on by the event instructions including base addresses, and multiple calculations needed for computation of memory addresses to the third processor for accessing memory for the second processor for execution of the acceleration region; and (c) after configuration of the third processor by the first processor, executing the memory access instructions in the instruction storage circuitry by the third processor using the execution circuitry to exchange data between the second processor and external memory during operation of the second processor, the exchange of data being according to the event instructions, calculations, and action instructions describing data transfer operations operating on memory locations defined by the initialization values, to execute the acceleration region by the second and third processor and not by the first processor. 15. The method of claim 14 wherein during step (c) the first processor is operated in a reduced power mode consuming less power than in step (a). 16. The method of claim 14 including the step of the first processor providing the second set of accelerator computer instructions to the second processor. 17. The method of claim 14 wherein the second processor does not access the external memory except via the third processor. 18. The method of claim 14 including the step of the first processor providing initial memory access data to the third processor. 19. The method of claim 14 wherein the firs
by switching to a less power-consuming processor, e.g. sub-CPU · CPC title
Power saving in memory, e.g. RAM, cache · CPC title
Arrangements for communication of instructions and data · CPC title
Interprocessor communication · CPC title
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.