Instruction and logic to provide vector scatter-op and gather-op functionality
US-2017357514-A1 · Dec 14, 2017 · US
US2016378715A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016378715-A1 |
| Application number | US-201514752047-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 26, 2015 |
| Priority date | Jun 26, 2015 |
| Publication date | Dec 29, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers.
Opening claim text (preview).
What is claimed is: 1 . A hardware processor comprising: a plurality of execution units in parallel; a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units; and an offload engine with inputs connected to outputs of the plurality of second buffers. 2 . The hardware processor of claim 1 , wherein an output of the offload engine connects to an input of the first buffer. 3 . The hardware processor of claim 1 , further comprising data hazard resolution logic to simultaneously read from the output of the first buffer and write to the inputs of the plurality of second buffers. 4 . The hardware processor of claim 3 , wherein the data hazard resolution logic is to not insert a stall. 5 . The hardware processor of claim 1 , wherein the plurality of execution units are to execute at a first clock speed and the offload engine is to execute at a second, slower clock speed. 6 . The hardware processor of claim 1 , wherein the plurality of execution units each includes a shift register. 7 . The hardware processor of claim 1 , wherein the first buffer and the plurality of second buffers are first in first out (FIFO) buffers. 8 . The hardware processor of claim 1 , wherein the plurality of memory banks are four or more memory banks and each memory bank includes an input port and an output port separate from input ports and output ports of the other memory banks. 9 . A method comprising: connecting inputs of a plurality of execution units in parallel of a hardware processor to outputs of a first buffer and a plurality of memory banks and connecting inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units with a switch based on a control signal; and providing data to inputs of an offload engine from outputs of the plurality of second buffers. 10 . The method of claim 9 , further comprising providing data from an output of the offload engine to an input of the first buffer. 11 . The method of claim 9 , further comprising simultaneously reading from the output of the first buffer and writing to the inputs of the plurality of second buffers. 12 . The method of claim 11 , further comprising not inserting a stall. 13 . The method of claim 9 , further comprising the plurality of execution units executing at a first clock speed and the offload engine executing at a second, slower clock speed. 14 . The method of claim 9 , wherein the plurality of execution units each includes a shift register. 15 . The method of claim 9 , wherein the plurality of memory banks are four or more memory banks and each memory bank includes an input port and an output port separate from input ports and output ports of the other memory banks. 16 . The method of claim 9 , wherein the first buffer and the plurality of second buffers are first in first out (FIFO) buffers. 17 . A hardware processor comprising: a hardware decoder to decode an instruction; a hardware execution unit to execute the instruction to: connect inputs of a plurality of execution units in parallel of the hardware processor to outputs of a first buffer and a plurality of memory banks and connecting inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units based on a control signal; and provide data to inputs of an offload engine from outputs of the plurality of second buffers. 18 . The hardware processor of claim 17 , wherein an output of the offload engine connects to an input of the first buffer. 19 . The hardware processor of claim 17 , wherein the hardware execution unit is to execute the instruction to cause a simultaneous read from the output of the first buffer and write to the inputs of the plurality of second buffers. 20 . The hardware processor of claim 19 , wherein the hardware execution unit is to execute the instruction without inserting a stall. 21 . The hardware processor of claim 17 , wherein the plurality of execution units are to execute at a first clock speed and the offload engine is to execute at a second, slower clock speed. 22 . The hardware processor of claim 17 , wherein the plurality of execution units each includes a shift register. 23 . The hardware processor of claim 17 , wherein the first buffer and the plurality of second buffers are first in first out (FIFO) buffers. 24 . The hardware processor of claim 17 , wherein the plurality of memory banks are four or more memory banks and each memory bank includes an input port and an output port separate from input ports and output ports of the other memory banks.
Details on data memory access · CPC title
for access to memory bus (G06F13/28 takes precedence) · CPC title
where the synchronisation uses buffers, e.g. for speed matching between buses · CPC title
Dependency mechanisms, e.g. register scoreboarding · CPC title
Special arrangements thereof, e.g. mask or switch · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.