Processing System With Interspersed Processors With Multi-Layer Interconnection
US-2017286196-A1 · Oct 5, 2017 · US
US11550750B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11550750-B2 |
| Application number | US-202016931864-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 17, 2020 |
| Priority date | Nov 3, 2017 |
| Publication date | Jan 10, 2023 |
| Grant date | Jan 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A multi-processor system with processing elements, interspersed memory, and primary and secondary interconnection networks optimized for high performance and low power dissipation is disclosed. In the secondary network multiple message routing nodes are arranged in an interspersed fashion with multiple processors. A given message routing node may receive messages from other message nodes, and relay the received messages to destination message routing nodes using relative offsets included in the messages. The relative offset may specify a number of message nodes from the message node that originated a message to a destination message node.
Opening claim text (preview).
What is claimed is: 1. An apparatus, comprising: a plurality of processors including a particular processor that includes an address generator unit; a plurality of data memory routers coupled to the plurality of processors in an interspersed arrangement, wherein a particular data memory router is configured to relay received messages to at least one other data memory router of the plurality of data memory routers; and wherein the particular processor of the plurality of processors is configured to: set a particular predicate flag of a plurality of predicate flags that includes a first set predicate flags associated with a datapath included in the particular processor, and a second set of predicate flags associated with the address generator unit; conditionally execute an instruction using the plurality of predicate flags; and set, based on timing information associated with the address generator unit, a different predicate flag included in the second set of predicate flags. 2. The apparatus of claim 1 , wherein the particular processor is further configured to: in response to an execution of a test instruction, compare a first value and a second value to generate a result; and set, based on the result, the particular predicate flag. 3. The apparatus of claim 2 , wherein to compare the first value and the second value, the particular processor is further configured to perform a logical operation using the first value and the second value to generate the result. 4. The apparatus of claim 1 , wherein the particular processor is further configured to set, based on timing information associated with the datapath, the particular predicate flag. 5. A method, comprising: setting, by a particular processor of a plurality of processors, a particular predicate flag of a plurality of predicate flags that includes a first set of predicate flags associated with a datapath included in the particular processor, wherein the plurality of processors is coupled to a plurality of data memory routers in an interspersed arrangement, and wherein the particular processor includes an address generator unit; conditionally executing, by the particular processor, an instruction using the plurality of predicate flags; and setting, by the particular processor and based on timing information associated with the address generator unit, a different predicate flag included in a second set of predicate flags included in the plurality of predicate flags, wherein the second set of predicate flags are associated with the address generator unit. 6. The method of claim 5 , further comprising: in response to executing a test instruction, comparing, by the particular processor, a first value and a second value to generate a result; and setting, by the particular processor and based on the result, the particular predicate flag. 7. The method of claim 6 , wherein comparing the first value and the second value includes performing a logical operation using the first value and the second value to generate the result. 8. The method of claim 5 , further comprising setting, by the particular processor and based on timing information associated with the datapath, the particular predicate flag. 9. The method of claim 5 , wherein the datapath includes a plurality of slots, wherein the instruction is included in a particular slot of the plurality of slots, and wherein conditionally executing the instruction includes selecting, based on the particular predicate flag, the particular slot. 10. An apparatus, comprising: a plurality of processors including a particular processor that includes a plurality of datapaths including a particular datapath that includes a plurality of arithmetic logic circuits, wherein a particular arithmetic logic circuit of plurality of arithmetic logic circuits includes a lookup table configured to store an offset; and a plurality of data memory routers coupled to the plurality of processors in an interspersed arrangement, wherein a particular data memory router is configured to relay received messages to at least one other data memory router of the plurality of data memory routers; and wherein the particular processor of the plurality of processors is configured to: selectively activate, based on a received instruction, a subset of the plurality of arithmetic logic circuits; execute the received instruction using the subset of the plurality of arithmetic logic circuits to generate a result; and add the offset to the result to generate a final result. 11. The apparatus of claim 10 , wherein to selectively activate the subset of the plurality of arithmetic logic circuits, the particular processor is further configured to: decode the received instruction to generate a decoded instruction; and selectively activate the subset of the plurality of arithmetic logic circuits using the decoded instruction. 12. The apparatus of claim 10 , wherein the particular processor is further configured to route, based on the received instruction, data between given arithmetic logic circuits of the subset of the plurality of arithmetic logic circuits. 13. The apparatus of claim 12 , wherein the particular datapath includes a plurality of multiplex circuits including a particular multiplex circuit coupled between a first arithmetic logic circuit of the plurality of arithmetic logic circuits and a second arithmetic logic circuit of the plurality of arithmetic logic circuits, and wherein to route the data, the particular processor is further configured to selectively change a state of the particular multiplex circuit. 14. The apparatus of claim 10 , wherein a particular arithmetic logic circuit of the plurality of arithmetic logic circuits includes at least an adder circuit. 15. The apparatus of claim 10 , wherein the received instruction specifies a log probability operation.
Routing techniques specific to parallel machines, e.g. wormhole, store and forward, shortest path problem congestion (routing on a LAN H04L45/00) · CPC title
Buffers; Shared memory; Pipes · CPC title
Event management; Broadcasting; Multicasting; Notifications · CPC title
Message passing systems or structures, e.g. queues · CPC title
wherein the interconnection is dynamically configurable, e.g. having loosely coupled nearest neighbor architecture (reconfigurable processors arrays G06F15/7867) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.