What technology area does this patent fall under?

Primary CPC classification G06F9/3851. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

SIMD operand permutation with selection from among multiple registers

US12008377B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12008377-B2
Application number	US-202318299452-A
Country	US
Kind code	B2
Filing date	Apr 12, 2023
Priority date	Nov 15, 2019
Publication date	Jun 11, 2024
Grant date	Jun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus, comprising: a graphics processor that includes: first storage circuitry configured to store multiple thread-specific versions of a first register for a threads of a single-instruction multiple-data (SIMD) group; second storage circuitry configured to store multiple thread-specific versions of a second register for the threads of the SIMD group; third storage circuitry configured to store multiple thread-specific versions of a third register for threads of the SIMD group; and a set of multiple hardware pipelines configured to: execute, using routing circuitry, an instruction to: store a proper subset of the thread-specific versions of the first register from the first storage circuitry in thread-specific versions of the third register in the third storage circuitry; and store a proper subset of the thread-specific versions of the second register from the second storage circuitry in thread-specific versions of the third register in the third storage circuitry; wherein the storage of the proper subsets stores values that are based on thread-specific versions of the first and second registers to other threads' versions of the third register for the SIMD group. 2. The apparatus of claim 1 , wherein the proper subset of the thread-specific versions of the first register stores a first portion of a graphics frame and the proper subset of the thread-specific versions of the first register stores a second portion of a graphics frame. 3. The apparatus of claim 2 , wherein the graphics processor is further configured to execute multiple instructions having the same format as the instruction to update contents of the third register to perform a convolution operation. 4. The apparatus of claim 1 , wherein the instruction specifies a shift amount that indicates a difference between thread positions in the SIMD group of a thread that provides a thread-specific version of the first register and a thread that receives the thread-specific version of the first register. 5. The apparatus of claim 1 , further comprising execution circuitry configured to perform an arithmetic operation specified by the instruction on values from thread-specific versions of the first and second registers and store respective thread-specific outputs of the operation to the third register. 6. A method, comprising: executing, by a graphics processor, a single-instruction multiple-data (SIMD) instruction that indicates: a shift amount and direction; multiple input registers; and a destination register; wherein the executing includes generating values that are based on proper subsets of thread-specific versions of the input registers in thread-specific versions of the destination register, including storing a value that is based on a thread-specific version of an input register of the multiple input registers to another thread's version of the destination register based on the shift amount and direction. 7. The method of claim 6 , wherein the executing includes performing an arithmetic operation specified by the instruction on values from thread-specific versions of first and second input registers and storing respective thread-specific outputs of the operation to the destination register. 8. The method of claim 6 , wherein the proper subsets are of contiguous groups of threads in a SIMD group on which the SIMD instruction operates. 9. The method of claim 6 , wherein the shift amount indicates a difference in thread position in a SIMD group on which the SIMD instruction operates, between a thread that provides the thread-specific version of the input register and a thread with the destination register. 10. The method of claim 6 , further comprising executing multiple instructions having the same instruction format as the SIMD instruction to perform a convolution operation. 11. The method of claim 6 , further comprising executing multiple instructions having the same instruction format as the SIMD instruction to perform an image filtering operation. 12. The method of claim 6 , wherein the executing includes performing a conversion operation to convert a thread-specific version of the input register to a different format before storing the different format in the other thread's version of the destination register. 13. The method of claim 6 , wherein the executing includes storing a value that is based on a thread-specific version of another input register of the multiple input registers to a different thread's version of the destination register. 14. A non-transitory computer-readable medium having instructions stored thereon that are executable by a computing device to perform operations comprising: storing proper subsets of thread-specific versions of multiple input registers in thread-specific versions of a destination register, including storing a value from a thread-specific version of an input register to another thread's version of the destination register based on a shift amount and a direction; wherein the storing is performed based on execution of a first single-instruction multiple-data (SIMD) instruction of the instructions, wherein the first SIMD instruction specifies: the shift amount, the direction, the multiple input registers, and the destination register. 15. The non-transitory computer-readable medium of claim 14 , wherein execution of the first SIMD instruction includes performing an arithmetic operation specified by the instruction on values from thread-specific versions of first and second input registers and storing respective thread-specific outputs of the operation to the destination register. 16. The non-transitory computer-readable medium of claim 14 , wherein the proper subsets are of contiguous groups of threads in a SIMD group on which the first SIMD instruction operates. 17. The non-transitory computer-readable medium of claim 14 , wherein the shift amount indicates a difference in thread position in a SIMD group on which the first SIMD instruction operates, between a thread that provides the thread-specific version of the input register and a thread with the destination register. 18. The non-transitory computer-readable medium of claim 14 , wherein the operations further comprise: executing multiple instructions having the same instruction format as the first SIMD instruction to perform a convolution operation. 19. The non-transitory computer-readable medium of claim 14 , wherein the operations further comprise: executing multiple instructions having the same instruction format as the first SIMD instruction to perform an image filtering operation. 20. The non-transitory computer-readable medium of claim 14 , wherein execution of the first SIMD instruction includes performing a conversion operation to convert a thread-specific version of the input register to a different format before storing the different format in the other thread's version of the destination register.

Assignees

Apple Inc

Inventors

Classifications

G06F9/3888
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
G06F9/3851Primary
from multiple instruction streams, e.g. multistreaming · CPC title
G06F9/3887Primary
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
G06T1/60
Memory management · CPC title
G06T1/20
Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

View patent family 73793800

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12008377B2 cover?: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution cir…
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).