Thread group scheduling for graphics processing
US-2020293380-A1 · Sep 17, 2020 · US
US12504989B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12504989-B2 |
| Application number | US-202217699992-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 21, 2022 |
| Priority date | Mar 21, 2022 |
| Publication date | Dec 23, 2025 |
| Grant date | Dec 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Bank aware thread scheduling and early dependency clearing techniques are described herein. In one example, bank aware thread scheduling involves arbitrating and scheduling threads based on the cache bank that is to be accessed by the instructions to avoiding bank conflicts. Early dependency clearing involves clearing dependencies for cache loads in a scoreboard before the data is loaded. In early dependency clearing for loads, delays in operation can be reduced by clearing dependencies before data is required from the cache.
Opening claim text (preview).
What is claimed is: 1 . A graphics processing unit (GPU) comprising: a cache including multiple banks; and hardware logic to schedule threads to access the cache, including to: determine which banks of the cache are to be accessed by threads available for scheduling, select a plurality of the threads available for scheduling based on the banks to be accessed by the threads, including to select threads that are to access different banks of the cache, and schedule the selected threads for execution. 2 . The GPU of claim 1 , wherein the hardware logic is to: determine if two or more of the selected threads have a bank conflict with instructions to access a same bank of the cache; and in response to the bank conflict, select another thread that is to access a different bank. 3 . The GPU of claim 1 , wherein the hardware logic to select the plurality of the threads is to: select the plurality of the threads based on which pipeline is to be used by the threads, including to select the threads that are to access different pipelines. 4 . The GPU of claim 3 , wherein the hardware logic is to: determine if two or more of the selected threads have a pipeline conflict with instructions to be sent to a same pipeline; and in response to the pipeline conflict, select another thread to replace the thread with the pipeline conflict. 5 . The GPU of claim 3 , wherein: the pipelines include: an integer pipeline, a floating point pipeline, and an extended math pipeline. 6 . The GPU of claim 3 , wherein: a number of threads to be selected for scheduling is equal to a number of pipelines. 7 . The GPU of claim 1 , wherein the hardware logic to select a plurality of the threads is to: select instructions from the threads for scheduling only from instructions that do not have dependencies within threads or across threads. 8 . The GPU of claim 1 , wherein the hardware logic is to: schedule a first instruction for execution; and clear a dependency for a second instruction that is dependent on the first instruction in response to scheduling the first instruction before data is loaded for the first instruction. 9 . The GPU of claim 8 , wherein the hardware logic to clear the dependency is to: clear the dependency in a scoreboard in response to scheduling the first instruction before the data is loaded. 10 . A system comprising: a memory device; and graphics processing unit (GPU) coupled with the memory device, the GPU including: a cache including multiple banks; and hardware logic to schedule threads to access the cache, including to: determine which banks of the cache are to be accessed by threads available for scheduling, select a plurality of the threads available for scheduling based on the banks to be accessed by the threads, including to select threads that are to access different banks of the cache, and schedule the selected threads for execution. 11 . The system of claim 10 , wherein the hardware logic is to: determine if two or more of the selected threads have a bank conflict with instructions to access a same bank of the cache; and in response to the bank conflict, select another thread that is to access a different bank. 12 . The system of claim 10 , wherein the hardware logic to select the plurality of the threads is to: select the plurality of the threads based on which pipeline is to be used by the threads, including to select the threads that are to access different pipelines. 13 . The system of claim 12 , wherein the hardware logic is to: determine if two or more of the selected threads have a pipeline conflict with instructions to be sent to a same pipeline; and in response to the pipeline conflict, select another thread to replace the thread with the pipeline conflict. 14 . The system of claim 12 , wherein: the pipelines include: an integer pipeline, a floating point pipeline, and an extended math pipeline. 15 . The system of claim 12 , wherein: a number of threads to be selected for scheduling is equal to a number of pipelines. 16 . The system of claim 10 , wherein the hardware logic to select a plurality of the threads is to: select instructions from the threads for scheduling only from instructions that do not have dependencies within threads or across threads. 17 . The system of claim 10 , wherein the hardware logic is to: schedule a first instruction for execution; and clear a dependency for a second instruction that is dependent on the first instruction in response to scheduling the first instruction before data is loaded for the first instruction. 18 . The system of claim 17 , wherein the hardware logic to clear the dependency is to: clear the dependency in a scoreboard in response to scheduling the first instruction before the data is loaded. 19 . A method comprising: determining which banks of a cache are to be accessed by threads available for scheduling; selecting a plurality of the threads available for scheduling based on the banks to be accessed by the threads, including selecting threads that are to access different banks of the cache; and scheduling the selected threads for execution. 20 . The method of claim 19 , further comprising: determining if two or more of the selected threads have a bank conflict with instructions to access a same bank of the cache; and in response to the bank conflict, selecting another thread that is to access a different bank.
to service a request · CPC title
Task transfer initiation or dispatching · CPC title
Program initiating; Program switching, e.g. by interrupt · CPC title
Partitioning or combining of resources · CPC title
Allocation of resources, e.g. of the central processing unit [CPU] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.