SIMD operand permutation with selection from among multiple registers

US11126439B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11126439-B2
Application numberUS-201916686060-A
CountryUS
Kind codeB2
Filing dateNov 15, 2019
Priority dateNov 15, 2019
Publication dateSep 21, 2021
Grant dateSep 21, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus, comprising: a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers, wherein respective hardware pipelines include: execution circuitry configured to perform operations, using one or more pipeline stages of the pipeline, on an operand for at least a first input operand position; and routing circuitry configured to select, based on the instruction, a first input operand for the first input operand position of the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline; and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline; wherein, for the first input operand position for the instruction, the routing circuitry is configured to select a value from the first architectural register for a first hardware pipeline and the second architectural register for a second hardware pipeline. 2. The apparatus of claim 1 , wherein, for a shift and fill instruction, the routing circuitry is configured to select a value from the first architectural register for a first contiguous subset of threads of a SIMD group and select a value from the second architectural register for a second contiguous subset of threads of the SIMD group. 3. The apparatus of claim 1 , wherein the routing circuitry includes a multiplexer level configured to select, for each thread, from between a value from the first architectural register and a value from the second architectural register. 4. The apparatus of claim 1 , wherein the apparatus is configured to execute a graphics program to: store pixel data for a first portion of a graphics frame in the first architectural register and store pixel data for a second portion of the graphics frame in the second architectural register; and perform one or more shift and fill instructions to store part of the first portion of the graphics frame and part of the second portion of the graphics frame in a third architectural register. 5. The apparatus of claim 4 , wherein the apparatus is further configured to execute the graphics program to perform one or more image filtering operations based on pixel data stored in the third architectural register. 6. The apparatus of claim 1 , wherein the apparatus is configured to perform a convolution operation by moving a window in a graphics frame by executing multiple shift and fill instructions. 7. The apparatus of claim 1 , wherein the routing circuitry is included in a pipeline stage between operand read circuitry and the execution circuitry. 8. The apparatus of claim 1 , further comprising conversion circuitry configured to convert the first input operand from a first format having a first precision to a second format having a different precision. 9. A non-transitory computer readable storage medium having stored thereon design information that specifies a design of at least a portion of a hardware integrated circuit in a format recognized by a semiconductor fabrication system that is configured to use the design information to produce the circuit according to the design, wherein the design information specifies that the circuit includes: a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers, wherein respective hardware pipelines include: execution circuitry configured to perform operations, using one or more pipeline stages of the pipeline, on an operand for least a first input operand position; and routing circuitry configured to select, based on the instruction, a first input operand for the first input operand position of the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline; and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline; wherein, for the first input operand position for the instruction, the routing circuitry is configured to select a value from the first architectural register for a first hardware pipeline and the second architectural register for a second hardware pipeline. 10. The non-transitory computer readable storage medium of claim 9 , wherein, for a shift and fill instruction, the routing circuitry is configured to select a value from the first architectural register for a first contiguous subset of threads of a SIMD group and select a value from the second architectural register for a second contiguous subset of threads of the SIMD group. 11. The non-transitory computer readable storage medium of claim 9 , wherein the routing circuitry includes a multiplexer level configured to select, for each thread, from between a value from the first architectural register and a value from the second architectural register. 12. The non-transitory computer readable storage medium of claim 9 , wherein the circuit is configured to execute a graphics program to: store pixel data for a first portion of a graphics frame in the first architectural register and store pixel data for a second portion of the graphics frame in the second architectural register; and perform one or more shift and fill instructions to store part of the first portion of the graphics frame and part of the second portion of the graphics frame in a third architectural register. 13. The non-transitory computer readable storage medium of claim 12 , wherein the circuit is further configured to execute the graphics program to perform one or more neighborhood filtering operations based on pixel data stored in the third architectural register. 14. The non-transitory computer readable storage medium of claim 9 , wherein the routing circuitry is further configured to select the first input operand for the execution circuitry from: a value from the first architectural register from thread-specific storage for the pipeline. 15. A method, comprising: executing, using a set of multiple hardware pipelines, a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers; selecting, based on the instruction, a first input operand for a first input operand position of execution circuitry of one or more of the pipelines from among: a value from the first architectural register from thread-specific storage for another pipeline; and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline wherein, for the first input operand position for the instruction, the selecting selects a value from the first architectural register for a first hardware pipeline and selects the second architectural register for a second hardware pipeline. 16. The method of claim 15 , further comprising: in response to a shift and fill instruction, selecting a value from the first architectural register for a first contiguous subset of threads of a SIMD group and select a value from the second architectural register for a second contiguous subset of threads of the SIMD group. 17. The method of claim 15 , further comprising: selecting, by multiplexer circuitry for each thread, from between a value from the first architectural register and a value from the second architectural register.

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • G06F9/3887Primary

    controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Memory management · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11126439B2 cover?
Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution cir…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 21 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).