Floating point to fixed point conversion
US-2019199370-A1 · Jun 27, 2019 · US
US12008377B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12008377-B2 |
| Application number | US-202318299452-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 12, 2023 |
| Priority date | Nov 15, 2019 |
| Publication date | Jun 11, 2024 |
| Grant date | Jun 11, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
Opening claim text (preview).
What is claimed is: 1. An apparatus, comprising: a graphics processor that includes: first storage circuitry configured to store multiple thread-specific versions of a first register for a threads of a single-instruction multiple-data (SIMD) group; second storage circuitry configured to store multiple thread-specific versions of a second register for the threads of the SIMD group; third storage circuitry configured to store multiple thread-specific versions of a third register for threads of the SIMD group; and a set of multiple hardware pipelines configured to: execute, using routing circuitry, an instruction to: store a proper subset of the thread-specific versions of the first register from the first storage circuitry in thread-specific versions of the third register in the third storage circuitry; and store a proper subset of the thread-specific versions of the second register from the second storage circuitry in thread-specific versions of the third register in the third storage circuitry; wherein the storage of the proper subsets stores values that are based on thread-specific versions of the first and second registers to other threads' versions of the third register for the SIMD group. 2. The apparatus of claim 1 , wherein the proper subset of the thread-specific versions of the first register stores a first portion of a graphics frame and the proper subset of the thread-specific versions of the first register stores a second portion of a graphics frame. 3. The apparatus of claim 2 , wherein the graphics processor is further configured to execute multiple instructions having the same format as the instruction to update contents of the third register to perform a convolution operation. 4. The apparatus of claim 1 , wherein the instruction specifies a shift amount that indicates a difference between thread positions in the SIMD group of a thread that provides a thread-specific version of the first register and a thread that receives the thread-specific version of the first register. 5. The apparatus of claim 1 , further comprising execution circuitry configured to perform an arithmetic operation specified by the instruction on values from thread-specific versions of the first and second registers and store respective thread-specific outputs of the operation to the third register. 6. A method, comprising: executing, by a graphics processor, a single-instruction multiple-data (SIMD) instruction that indicates: a shift amount and direction; multiple input registers; and a destination register; wherein the executing includes generating values that are based on proper subsets of thread-specific versions of the input registers in thread-specific versions of the destination register, including storing a value that is based on a thread-specific version of an input register of the multiple input registers to another thread's version of the destination register based on the shift amount and direction. 7. The method of claim 6 , wherein the executing includes performing an arithmetic operation specified by the instruction on values from thread-specific versions of first and second input registers and storing respective thread-specific outputs of the operation to the destination register. 8. The method of claim 6 , wherein the proper subsets are of contiguous groups of threads in a SIMD group on which the SIMD instruction operates. 9. The method of claim 6 , wherein the shift amount indicates a difference in thread position in a SIMD group on which the SIMD instruction operates, between a thread that provides the thread-specific version of the input register and a thread with the destination register. 10. The method of claim 6 , further comprising executing multiple instructions having the same instruction format as the SIMD instruction to perform a convolution operation. 11. The method of claim 6 , further comprising executing multiple instructions having the same instruction format as the SIMD instruction to perform an image filtering operation. 12. The method of claim 6 , wherein the executing includes performing a conversion operation to convert a thread-specific version of the input register to a different format before storing the different format in the other thread's version of the destination register. 13. The method of claim 6 , wherein the executing includes storing a value that is based on a thread-specific version of another input register of the multiple input registers to a different thread's version of the destination register. 14. A non-transitory computer-readable medium having instructions stored thereon that are executable by a computing device to perform operations comprising: storing proper subsets of thread-specific versions of multiple input registers in thread-specific versions of a destination register, including storing a value from a thread-specific version of an input register to another thread's version of the destination register based on a shift amount and a direction; wherein the storing is performed based on execution of a first single-instruction multiple-data (SIMD) instruction of the instructions, wherein the first SIMD instruction specifies: the shift amount, the direction, the multiple input registers, and the destination register. 15. The non-transitory computer-readable medium of claim 14 , wherein execution of the first SIMD instruction includes performing an arithmetic operation specified by the instruction on values from thread-specific versions of first and second input registers and storing respective thread-specific outputs of the operation to the destination register. 16. The non-transitory computer-readable medium of claim 14 , wherein the proper subsets are of contiguous groups of threads in a SIMD group on which the first SIMD instruction operates. 17. The non-transitory computer-readable medium of claim 14 , wherein the shift amount indicates a difference in thread position in a SIMD group on which the first SIMD instruction operates, between a thread that provides the thread-specific version of the input register and a thread with the destination register. 18. The non-transitory computer-readable medium of claim 14 , wherein the operations further comprise: executing multiple instructions having the same instruction format as the first SIMD instruction to perform a convolution operation. 19. The non-transitory computer-readable medium of claim 14 , wherein the operations further comprise: executing multiple instructions having the same instruction format as the first SIMD instruction to perform an image filtering operation. 20. The non-transitory computer-readable medium of claim 14 , wherein execution of the first SIMD instruction includes performing a conversion operation to convert a thread-specific version of the input register to a different format before storing the different format in the other thread's version of the destination register.
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Memory management · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.