Method and apparatus for compiling optimization using activation recalculation
US-2024303054-A1 · Sep 12, 2024 · US
US11262787B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11262787-B2 |
| Application number | US-202016744249-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 16, 2020 |
| Priority date | Oct 20, 2017 |
| Publication date | Mar 1, 2022 |
| Grant date | Mar 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal a send instruction to transmit at least one data packet at a predetermined transmit time, relative to the synchronisation signal, destined for a recipient processing unit but having no destination identifier, and a local program allocated to the recipient processing unit is scheduled to execute at a predetermined switch time a switch control instruction to control the switching circuitry to connect its processing unit wire to the switching fabric to receive the data packet at a receive time.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method of generating multiple programs to be executed in a computer comprising a plurality of processing units, the method comprising: generating a first local program for a first processing unit of the plurality of processing units, the first local program comprising a first sequence of executable instructions; configuring the first local program to transmit a data packet having no destination identifier at a transmit time, relative to a synchronisation signal, and destined for a second processing unit; generating a second local program for the second processing unit of the plurality of processing units, the second local program comprising a second sequence of executable instructions; and scheduling the second local program to control switching circuitry to receive the data packet at a receive time. 2. The method of claim 1 , wherein the first processing unit and the second processing unit have a fixed positional relationship with respect to each other, and the configuring the first local program comprises determining a fixed delay based on the positional relationship between the first processing unit and the second processing unit. 3. The method of claim 2 , wherein the fixed positional relationship comprises an array of rows and columns, wherein the first processing unit has a first identifier which identifies its position in the array, and wherein the second processing unit has a second identifier which identifies its position in the array. 4. The method of claim 1 , wherein the switching circuitry comprises a multiplexer having an output set of wires connected to the second processing unit and multiple sets of input wires connectable to a switching fabric, the multiplexer located on the computer at a physical location with respect to the second processing unit, and wherein the configuring the first local program comprises determining a fixed delay for a switch control instruction to reach the multiplexer and the data packet to reach an input interface of the second processing unit from the multiplexer. 5. The method of claim 1 , further comprising providing in the first local program a synchronisation instruction which indicates that a compute phase at the first processing unit has completed. 6. The method of claim 5 , wherein the configuring the first local program comprises determining for the first processing unit a fixed delay between a synchronisation event on a chip and receiving back at the first processing unit an acknowledgement that the synchronisation event has occurred. 7. The method of claim 1 , wherein the configuring the first local program comprises accessing a look-up table holding information about delays enabling the transmit time at the first processing unit and a switching time at the second processing unit to be determined. 8. The method of claim 1 , wherein the first local program and the second local program deliver a machine learning function. 9. A compiler having a processor programmed to carry out a method of generating multiple programs to deliver a computerised function, each program to be executed in a computer comprising a plurality of processing units, the method comprising: generating a first local program for a first processing unit, the first local program comprising a first sequence of executable instructions; configuring the first local program to execute with a delay relative to a synchronisation signal a send instruction to transmit a data packet having no destination identifier at a transmit time and destined for a second processing unit; and generating a second local program for the second processing unit, including scheduling the second local program to execute at a switch time a switch control instruction to connect the second processing unit to a switching fabric to receive the data packet at a receive time. 10. The compiler of claim 9 , wherein the compiler is configured to receive a fixed graph structure representing the computerised function and a table holding delays enabling the transmit time and the switch time to be determined. 11. The compiler of claim 10 , wherein the fixed graph structure comprises a plurality of nodes, each node being represented by a codelet in the first local program. 12. The compiler of claim 10 , wherein the fixed graph structure comprises a plurality of nodes represented by a codelet in the second local program. 13. The compiler of claim 9 , wherein the configuring the first local program comprises determining a fixed delay based on a positional relationship between the first processing unit and the second processing unit. 14. The compiler of claim 9 , the method further comprising providing in the first local program a synchronisation instruction which indicates that a compute phase at the first processing unit has completed. 15. A computer program recorded on non transmissible media and comprising computer readable instructions which when executed by a processor of a compiler implement a method, the method comprising: generating a first local program for a first processing unit, the first local program comprising a first sequence of executable instructions; configuring the first local program to transmit a data packet having no destination identifier at a transmit time, relative to a synchronisation signal, and destined for a second processing unit; generating a second local program for the second processing unit, the second local program comprising a second sequence of executable instructions; and scheduling the second local program to control switching circuitry to receive the data packet at a receive time. 16. The computer program of claim 15 , wherein the configuring the first local program comprises determining a fixed delay based on a positional relationship between the first processing unit and the second processing unit. 17. The computer program of claim 15 , the method further comprising: providing in the first local program a synchronisation instruction which indicates that a compute phase at the first processing unit has completed. 18. The computer program of claim 15 , wherein the configuring the first local program comprises determining for the first processing unit a fixed delay between a synchronisation event on a chip and receiving back at the first processing unit an acknowledgement that the synchronisation event has occurred. 19. The computer program of claim 15 , wherein the configuring the first local program comprises accessing a look-up table holding information about delays enabling the transmit time at the first processing unit and a switching time at the second processing unit to be determined. 20. The computer program of claim 15 , wherein the method further comprises receiving at the compiler a fixed graph structure representing a computerised function and a table holding delays enabling the transmit time and a switch time to be determined.
from multiple instruction streams, e.g. multistreaming · CPC title
using a plurality of independent parallel functional units · CPC title
using switching circuits, e.g. switching matrix, connection or expansion network (G06F13/4009 takes precedence) · CPC title
Code distribution (considering CPU load at run-time G06F9/505; load rebalancing G06F9/5083) · CPC title
Synchronisation or serialisation instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.