Floating point to fixed point conversion
US-2019199370-A1 · Jun 27, 2019 · US
US11126439B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11126439-B2 |
| Application number | US-201916686060-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 15, 2019 |
| Priority date | Nov 15, 2019 |
| Publication date | Sep 21, 2021 |
| Grant date | Sep 21, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
Opening claim text (preview).
What is claimed is: 1. An apparatus, comprising: a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers, wherein respective hardware pipelines include: execution circuitry configured to perform operations, using one or more pipeline stages of the pipeline, on an operand for at least a first input operand position; and routing circuitry configured to select, based on the instruction, a first input operand for the first input operand position of the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline; and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline; wherein, for the first input operand position for the instruction, the routing circuitry is configured to select a value from the first architectural register for a first hardware pipeline and the second architectural register for a second hardware pipeline. 2. The apparatus of claim 1 , wherein, for a shift and fill instruction, the routing circuitry is configured to select a value from the first architectural register for a first contiguous subset of threads of a SIMD group and select a value from the second architectural register for a second contiguous subset of threads of the SIMD group. 3. The apparatus of claim 1 , wherein the routing circuitry includes a multiplexer level configured to select, for each thread, from between a value from the first architectural register and a value from the second architectural register. 4. The apparatus of claim 1 , wherein the apparatus is configured to execute a graphics program to: store pixel data for a first portion of a graphics frame in the first architectural register and store pixel data for a second portion of the graphics frame in the second architectural register; and perform one or more shift and fill instructions to store part of the first portion of the graphics frame and part of the second portion of the graphics frame in a third architectural register. 5. The apparatus of claim 4 , wherein the apparatus is further configured to execute the graphics program to perform one or more image filtering operations based on pixel data stored in the third architectural register. 6. The apparatus of claim 1 , wherein the apparatus is configured to perform a convolution operation by moving a window in a graphics frame by executing multiple shift and fill instructions. 7. The apparatus of claim 1 , wherein the routing circuitry is included in a pipeline stage between operand read circuitry and the execution circuitry. 8. The apparatus of claim 1 , further comprising conversion circuitry configured to convert the first input operand from a first format having a first precision to a second format having a different precision. 9. A non-transitory computer readable storage medium having stored thereon design information that specifies a design of at least a portion of a hardware integrated circuit in a format recognized by a semiconductor fabrication system that is configured to use the design information to produce the circuit according to the design, wherein the design information specifies that the circuit includes: a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers, wherein respective hardware pipelines include: execution circuitry configured to perform operations, using one or more pipeline stages of the pipeline, on an operand for least a first input operand position; and routing circuitry configured to select, based on the instruction, a first input operand for the first input operand position of the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline; and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline; wherein, for the first input operand position for the instruction, the routing circuitry is configured to select a value from the first architectural register for a first hardware pipeline and the second architectural register for a second hardware pipeline. 10. The non-transitory computer readable storage medium of claim 9 , wherein, for a shift and fill instruction, the routing circuitry is configured to select a value from the first architectural register for a first contiguous subset of threads of a SIMD group and select a value from the second architectural register for a second contiguous subset of threads of the SIMD group. 11. The non-transitory computer readable storage medium of claim 9 , wherein the routing circuitry includes a multiplexer level configured to select, for each thread, from between a value from the first architectural register and a value from the second architectural register. 12. The non-transitory computer readable storage medium of claim 9 , wherein the circuit is configured to execute a graphics program to: store pixel data for a first portion of a graphics frame in the first architectural register and store pixel data for a second portion of the graphics frame in the second architectural register; and perform one or more shift and fill instructions to store part of the first portion of the graphics frame and part of the second portion of the graphics frame in a third architectural register. 13. The non-transitory computer readable storage medium of claim 12 , wherein the circuit is further configured to execute the graphics program to perform one or more neighborhood filtering operations based on pixel data stored in the third architectural register. 14. The non-transitory computer readable storage medium of claim 9 , wherein the routing circuitry is further configured to select the first input operand for the execution circuitry from: a value from the first architectural register from thread-specific storage for the pipeline. 15. A method, comprising: executing, using a set of multiple hardware pipelines, a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers; selecting, based on the instruction, a first input operand for a first input operand position of execution circuitry of one or more of the pipelines from among: a value from the first architectural register from thread-specific storage for another pipeline; and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline wherein, for the first input operand position for the instruction, the selecting selects a value from the first architectural register for a first hardware pipeline and selects the second architectural register for a second hardware pipeline. 16. The method of claim 15 , further comprising: in response to a shift and fill instruction, selecting a value from the first architectural register for a first contiguous subset of threads of a SIMD group and select a value from the second architectural register for a second contiguous subset of threads of the SIMD group. 17. The method of claim 15 , further comprising: selecting, by multiplexer circuitry for each thread, from between a value from the first architectural register and a value from the second architectural register.
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Memory management · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.