Control flow mechanism for execution of graphics processor instructions using active channel packing

US10990409B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10990409-B2
Application numberUS-201715493442-A
CountryUS
Kind codeB2
Filing dateApr 21, 2017
Priority dateApr 21, 2017
Publication dateApr 27, 2021
Grant dateApr 27, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a graphics processor, including: a plurality of processing units to execute single instruction, multiple data (SIMD) instructions; and wherein the graphics processor is to: detect a diverging control flow in a plurality of SIMD channels of a first SIMD instruction for an application, the first SIMD instruction comprising a plurality of sections of SIMD channels, each section comprising multiple SIMD channels; detect which SIMD channels of the plurality of SIMD channels are active SIMD channels and determine indices of the active SIMD channels, wherein the indices comprise set bits corresponding to the active SIMD channels in an input register of the graphics processor; determine whether a number of the active SIMD channels is below a predetermined threshold percentage of the plurality of SIMD channels; and upon determining the number is below the threshold percentage, further to: identify a code region of the application impacted by the diverging control flow; duplicate the identified code region into a duplicated code region that executes using a subset of the plurality of SIMD channels, the subset comprising one or more of the plurality of sections of SIMD channels; pack input to the active SIMD channels into the subset of the plurality of SIMD channels by applying an execution mask that utilizes the set bits of the determined indices of the active SIMD channels to identify the active SIMD channels for packing into the subset of the plurality of SIMD channels; execute packed instructions for the subset of the plurality SIMD channels to generate computed data; and unpack, at an output register of the graphics processor, any output from the subset of the plurality of SIMD channels that is consumed outside of the identified code region into original locations for the active SIMD channels in the plurality of SIMD channels. 2. The apparatus of claim 1 , wherein the graphics processor is to detect whether the active SIMD channels are spread over two or more of the plurality of SIMD sections. 3. The apparatus of claim 2 , wherein the graphics processor is to prevent packing input to the active SIMD channels into a subset of the SIMD channels upon detecting that the active SIMD channels are not spread over two or more of the plurality of SIMD sections. 4. The apparatus of claim 1 , wherein the subset of the SIMD channels comprises half of the plurality of SIMD channels. 5. The apparatus of claim 1 , further comprising a register file including: one or more bit registers to store the set bits; and one or more byte registers including a plurality of registers. 6. The apparatus of claim 5 , wherein the graphics processor is further to find the indices of register bits that are set in a bit register of the one or more bit registers and write the indices to a bytes register of the one or more byte registers. 7. The apparatus of claim 1 , wherein the graphics processor is further to perform consecutive channel execution upon detecting a diverging control flow wherein the number of active channels is below a predetermined number. 8. The apparatus of claim 1 , wherein the graphics processor is further to: detect shader branch instructions associated with the SIMD instructions; and reconfigure hardware resources of the graphics processor upon detection of the shader branch instructions. 9. The apparatus of claim 8 , wherein the graphics processor is further to inject SIMD instructions into profile branch directions such that the SIMD instructions are randomly injected based on statistical sampling. 10. The apparatus of claim 1 , wherein each section of the plurality of sections has a width of a second SIMD instruction, the second SIMD instruction being smaller than the first SIMD instruction. 11. The apparatus of claim 1 , wherein the graphics processor is further to: shut down one or more inactive sections of SIMD channels during execution of the active SIMD channels. 12. The apparatus of claim 1 , wherein the code region is identified based at least in part on a determination as to where active channel packing can be applied in the application. 13. The apparatus of claim 1 , wherein the code region is identified based at least in part on profile data collected from previous runs of the application. 14. A method, comprising: detecting a diverging control flow in a plurality of single instruction, multiple data (SIMD) channels of a first SIMD instruction for an application; detecting which SIMD channels of the plurality of SIMD channels are active SIMD channels and determining indices of the active SIMD channels, wherein the indices comprise set bits corresponding to the active SIMD channels in an input register of the graphics processor, and wherein the first SIMD instruction comprising a plurality of sections of SIMD channels, each section comprising multiple SIMD channels; determining whether a number of the active SIMD channels among the plurality of SIMD channels is below a predetermined threshold percentage of the plurality of SIMD channels; and upon determining the number is below the threshold percentage, further: identifying a code region of the application impacted by the diverging control flow; duplicating the identified code region into a duplicated code region that executes using a subset of the plurality of SIMD channels, the subset comprising one or more of the plurality of sections of SIMD channels; packing input to the active SIMD channels into the subset of the plurality of SIMD channels by applying an execution mask that utilizes the set bits of the determined indices of the active SIMD channels to identify the active SIMD channels for packing into the subset of the plurality of SIMD channels; executing packed instructions for the subset of the plurality of SIMD channels to generate computed data; and unpacking, at an output register of the graphics processor, any output from the subset of the plurality of SIMD channels that is consumed outside of the identified code region into original locations for the active SIMD channels in the plurality of SIMD channels. 15. The method of claim 14 , further comprising: detecting whether the active SIMD channels are spread over two or more of the plurality of SIMD sections; and preventing packing input to the active SIMD channels into the subset of the plurality of SIMD channels upon detecting that the active SIMD channels are not spread over two or more of the plurality of SIMD sections. 16. The method of claim 14 , further comprising: identifying the code region based at least in part on a determination as to where active channel packing can be applied in the application. 17. The method of claim 14 , further comprising: identifying the code region based at least in part on profile data collected from previous runs of the application. 18. A non-transitory computer readable medium having instructions, which when executed by one or more processors, cause the processors to: detect a diverging control flow in a plurality of single instruction, multiple data (SIMD) channels of a first SIMD instruction for an application; detect which SIMD channels of the plurality of SIMD channels are active SIMD channels and determine indices of the active SIMD channels, wherein the indices comprise set bits corresponding to the active SIMD channels in an input register of the graphics processor, and wherein the first SIMD instruction comprising a plurality of sections of SIMD channels, each section comprising multiple SIMD channels; determine

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • G06F9/3887Primary

    controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Conditional branch instructions · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • for indirect branch instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10990409B2 cover?
An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 27 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).