Runtime mechanism to optimize shader execution flow

US12229864B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12229864-B2
Application numberUS-202217817815-A
CountryUS
Kind codeB2
Filing dateAug 5, 2022
Priority dateAug 5, 2022
Publication dateFeb 18, 2025
Grant dateFeb 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for graphics processing, comprising: a memory; and at least one processor coupled to the memory and configured to: obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations; configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations, the at least one predication value indicating a likelihood of an occurrence of a condition for the graphics workload, the condition for the graphics workload being associated with a streaming processor of a graphics processing unit (GPU), wherein the streaming processor comprises a feedback controller block configured to enable the at least one predication value to be configured, and wherein the feedback controller block comprises a set of context registers per shader slot associated with the set of shader operations and a set of counters; adjust, at a second iteration subsequent to the first iteration and without altering a function or a result of the graphics workload, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations; and execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload. 2. The apparatus of claim 1 , wherein the graphics data for the set of shader operations includes at least one of: lighting data, illumination data, depth data, shadow data, or radius data. 3. The apparatus of claim 1 , wherein the instruction execution data is obtained based on a floating multiply (FMUL) unit of the GPU, and the at least one predication value configured at the first iteration indicates the likelihood of the occurrence of the condition for the graphics workload for the second iteration. 4. The apparatus of claim 1 , wherein the set of shader operations includes a set of multiplication operations for the graphics workload. 5. The apparatus of claim 1 , wherein the at least one predication value is a 1-bit value. 6. The apparatus of claim 1 , wherein to configure the at least one predication value, the at least one processor is further configured to: generate the at least one predication value based on at least one of a shader preamble, a feedback shader, or a batch of data for the graphics workload. 7. The apparatus of claim 1 , wherein to adjust the execution flow of the graphics workload, the at least one processor is further configured to: load the set of shader operations for a plurality of shader programs; and combine each of the plurality of shader programs based on the loaded set of shader operations. 8. The apparatus of claim 7 , wherein the execution flow of the graphics workload corresponds to a shader sequence for the plurality of shader programs. 9. The apparatus of claim 1 , wherein to execute each of the set of shader operations, the at least one processor is further configured to: perform one or more shader operations of the set of shader operations based on the adjusted execution flow of the graphics workload. 10. The apparatus of claim 1 , wherein to refrain from executing each of the set of shader operations, the at least one processor is further configured to: skip at least one shader operation of the set of shader operations based on the adjusted execution flow of the graphics workload. 11. The apparatus of claim 1 , the at least one processor being further configured to: update or maintain a configuration of a shader processor or the streaming processor at the GPU based on executing or refraining from executing each of the set of shader operations. 12. The apparatus of claim 1 , the at least one processor being further configured to: store data associated with each of the set of shader operations upon executing or refraining from executing each of the set of shader operations. 13. The apparatus of claim 12 , wherein the data is stored in at least one of: a system memory, a double data rate (DDR) random access memory (RAM), a constant memory, or an on-chip memory. 14. The apparatus of claim 1 , wherein the apparatus is a wireless communication device. 15. A method of graphics processing, comprising: obtaining instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations; configuring, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations the at least one predication value indicating a likelihood of an occurrence of a condition for the graphics workload, the condition for the graphics workload being associated with a streaming processor of a graphics processing unit (GPU), wherein the streaming processor comprises a feedback controller block configured to enable the at least one predication value to be configured, and wherein the feedback controller block comprises a set of context registers per shader slot associated with the set of shader operations and a set of counters; adjusting, at a second iteration subsequent to the first iteration and without altering a function or a result of the graphics workload, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations; and executing or refraining from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload. 16. The method of claim 15 , wherein the graphics data for the set of shader operations includes at least one of: lighting data, illumination data, depth data, shadow data, or radius data. 17. The method of claim 15 , wherein the instruction execution data is obtained based on a floating multiply (FMUL) unit of the GPU, and the at least one predication value configured at the first iteration indicates the likelihood of the occurrence of the condition for the graphics workload for the second iteration. 18. The method of claim 15 , wherein the set of shader operations includes a set of multiplication operations for the graphics workload. 19. The method of claim 15 , wherein the at least one predication value is a 1-bit value. 20. The method of claim 15 , wherein configuring the at least one predication value further comprises generating the at least one predication value based on at least one of a shader preamble, a feedback shader, or a batch of data for the graphics workload. 21. The method of claim 15 , wherein adjusting the execution flow of the graphics workload further comprises: loading the set of shader operations for a plurality of shader programs; and combining each of the plurality of shader programs based on the loaded set of shader operations. 22. The method of claim 21 , wherein the execution flow of the graphics workload corresponds to a shader sequence for the plurality of shader programs. 23. The method of claim 15 , wherein executing each of the set of shader operations further comprises performing one or more shader operations of the set of shader operations based on the adjusted execution flow of the graphics workload. 24. The method of claim 15 , wherein re

Assignees

Inventors

Classifications

  • Shading · CPC title

  • Runtime code conversion or optimisation · CPC title

  • G06T15/005Primary

    General purpose rendering architectures · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12229864B2 cover?
This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, …
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G06T15/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).