Reconfigurable virtual graphics and compute processor pipeline
US-2018114290-A1 · Apr 26, 2018 · US
US11422822B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11422822-B2 |
| Application number | US-202016870330-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 8, 2020 |
| Priority date | May 8, 2020 |
| Publication date | Aug 23, 2022 |
| Grant date | Aug 23, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.
Opening claim text (preview).
What is claimed is: 1. An apparatus, comprising: pipeline circuitry configured to perform operations specified by instructions of first and second single-instruction multiple-data (SIMD) groups assigned to the pipeline circuitry, wherein the pipeline circuitry includes: first front-end circuitry configured to decode instructions of the first assigned SIMD group; second front-end circuitry configured to decode instructions of the second assigned SIMD group, wherein the first and second front-end circuitry are configured to decode an instruction of the first SIMD group and an instruction of the second SIMD group in parallel in a given cycle; shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups; and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle, such that instructions from both the first SIMD group and the second SIMD group occupy different stages of the shared execution circuitry at a given time, wherein the selection is based on at least the following inputs: stall counts for instructions from the first and second front-end circuitry; whether execution units targeted by instructions from the first and second front-end circuitry are saturated; and whether instructions from the first and second front-end circuitry are speculatively executed. 2. The apparatus of claim 1 , wherein the arbitration circuitry is further configured to select an instruction based on one or more of the following types of information: numbers of execution cycles for available instructions; and ages of available instructions. 3. The apparatus of claim 1 , wherein the arbitration circuitry includes first control circuitry configured to select, from among at least a first instruction from the first front-end circuitry and a second instruction from the second front-end circuitry, an instruction that has a smaller stall count. 4. The apparatus of claim 3 , wherein the arbitration circuitry includes second control circuitry configured to, in the absence of a selection by the first control circuitry: based on a determination that the first instruction targets a particular execution unit and the second instruction does not target the particular execution unit, select from among the first and second instructions based on whether the particular execution unit is saturated. 5. The apparatus of claim 4 , wherein the arbitration circuitry includes third control circuitry configured to, in the absence of a selection by the first and second control circuitry: select the first instruction based on a determination that the first instruction is not a speculative instruction and the second instruction is a speculative instruction. 6. The apparatus of claim 5 , wherein the arbitration circuitry is configured to, in the absence of a selection by the first, second, and third control circuitry, select an instruction from an older SIMD group from among the first and second instructions. 7. The apparatus of claim 1 , wherein the first and second front-end circuitry include respective hazard detection stages configured to generate stall counts based on detected hazards. 8. The apparatus of claim 1 , wherein the first and second front-end circuitry include respective operand cache allocation stages; wherein the shared execution circuitry includes an operand cache load stage; and wherein the shared execution circuitry includes an issue stage and a plurality of execution stages. 9. The apparatus of claim 1 , wherein the shared execution circuitry includes a plurality of execution pipelines configured to execute different respective sets of instruction types. 10. The apparatus of claim 1 , further comprising: a central processing unit; a graphics processor; and network interface circuitry; wherein the pipeline circuitry is included in at least one of the central processing unit or the graphics processor. 11. A non-transitory computer readable storage medium having stored thereon design information that specifies a design of at least a portion of a hardware integrated circuit in a format recognized by a semiconductor fabrication system that is configured to use the design information to produce the circuit according to the design, wherein the design information specifies that the circuit includes: pipeline circuitry configured to perform operations specified by instructions of first and second single-instruction multiple-data (SIMD) groups assigned to the pipeline circuitry, wherein the pipeline circuitry includes: first front-end circuitry configured to decode instructions of the first assigned SIMD group; second front-end circuitry configured to decode instructions of the second assigned SIMD group, wherein the first and second front-end circuitry are configured to decode an instruction of the first SIMD group and an instruction of the second SIMD group in parallel in a given cycle; shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups; and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle, such that instructions from both the first SIMD group and the second SIMD group occupy different stages of the shared execution circuitry at a given time, wherein the selection is based on at least the following inputs: stall counts for instructions from the first and second front-end circuitry; whether execution units targeted by instructions from the first and second front-end circuitry are saturated; and whether instructions from the first and second front-end circuitry are speculatively executed. 12. The non-transitory computer readable storage medium of claim 11 , wherein the arbitration circuitry is further configured to select an instruction based on: whether ones of available instructions target a particular portion of the shared execution circuitry; numbers of execution cycles for available instructions; and ages of available instructions. 13. The non-transitory computer readable storage medium of claim 11 , wherein the arbitration circuitry includes first control circuitry configured to select, from among at least a first instruction from the first front-end circuitry and a second instruction from the second front-end circuitry, an instruction that has a smaller stall count; and wherein the arbitration circuitry includes second control circuitry configured to, in the absence of a selection by the first control circuitry and based on a determination that the first instruction targets a particular execution unit and the second instruction does not target the particular execution unit, select from among the first and second instructions based on whether the particular execution unit is saturated. 14. The non-transitory computer readable storage medium of claim 13 , wherein the arbitration circuitry includes third control circuitry configured to, in the absence of a selection by the first and second control circuitry, select the first instruction based on a determination that the first instruction is not a speculative instruction and the second instruction is a speculative instruction; and wherein the arbitration circuitry is configured to, in the absence of a selection by the first, second, and third control circuitry, select an instruction from an older SIMD group from among the first and second instructions. 15. The non-transitory computer
Parallel decoding, e.g. parallel decode units · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking · CPC title
Buffers; Shared memory; Pipes · CPC title
single instruction multiple data [SIMD] multiprocessors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.