Multiple register allocation sizes for gpu hardware threads

US2025147762A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025147762-A1
Application numberUS-202318504407-A
CountryUS
Kind codeA1
Filing dateNov 8, 2023
Priority dateNov 8, 2023
Publication dateMay 8, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein is a graphics processor having processing resources with configurable thread and register configurations. Program code can configure a number of registers and accumulators that will be used by hardware threads during execution of the program code by the graphics processor. Processing resources within the graphics processor can be configured to assign different numbers of registers and accumulators to hardware threads based on the configuration requested by program code to be executed by the processing resource.

First claim

Opening claim text (preview).

What is claimed is: 1 . A graphics processor comprising: a memory interface; a processing cluster coupled with the memory interface, the processing cluster including a plurality of graphics cores coupled via a data interconnect; and circuitry to dispatch workloads for execution by processing resources within a graphics core of the plurality of graphics cores, the circuitry configured to: receive a request to dispatch program code for execution, the program code associated with a register configuration selected from a plurality of register configurations; select a processing resource of a plurality of processing resources within the graphics core, the processing resource determined to have sufficient available resources to satisfy the register configuration for the program code; assign a number of registers to a hardware thread of the processing resource based on the register configuration selected for the program code; and execute an instruction of the program code via the hardware thread. 2 . The graphics processor of claim 1 , wherein each of the plurality of register configurations specify a number of registers to assign to the hardware thread of the processing resource. 3 . The graphics processor of claim 2 , wherein the number of registers to assign to the hardware thread of the processing resource include a first number of general-purpose registers to assign to the hardware thread of the processing resource and a second number of accumulator registers to assign to the hardware thread of the processing resource. 4 . The graphics processor of claim 3 , wherein the register configuration is selected for the program code based on a shader type associated with the program code. 5 . The graphics processor of claim 1 , wherein the circuitry is configured to track a number of registers of the processing resource that are allocated to active hardware threads within the processing resource. 6 . The graphics processor of claim 5 , wherein the circuitry is configured to track a number of free registers in each respective processing resource of the plurality of processing resources within the graphics core. 7 . The graphics processor of claim 5 , wherein the circuitry is configured to track registers of the processing resource at register block granularity, wherein a register block includes multiple contiguous registers. 8 . The graphics processor of claim 7 , wherein a register block includes 32 registers. 9 . The graphics processor of claim 8 , wherein the circuitry is configured to select the processing resource of the plurality of processing resources within the graphics core via a round-robin scheduler. 10 . The graphics processor of claim 9 , wherein the circuitry is configured to: determine, based on the register configuration associated with the program code, whether sufficient contiguous register blocks are available in the processing resource; bypass the processing resource in response to a determination that sufficient contiguous register blocks are not available; and select a next available processing resource in the plurality of processing resources. 11 . A method for comprising: receiving a request to dispatch program code for execution to a processing resource within a graphics core of a graphics processor; determining whether variable registers per thread (VRT) is enabled for the program code; statically configuring the processing resource within the graphics core that was selected to execute the program code with a default number of registers per thread in response to a determination that VRT is not enabled for the program code; dynamically allocating registers to a hardware thread of the processing resource according to a register configuration associated with the program code in response to a determination that VRT is enabled for the program code; and executing an instruction of the program code via the hardware thread. 12 . The method of claim 11 , wherein the register configuration associated with the program code is selected from a plurality of register configurations, each of the plurality of register configurations specify a number of registers to assign to the hardware thread of the processing resource. 13 . The method of claim 12 , further comprising selecting a register configuration from the plurality of register configurations based on a shader type associated with the program code. 14 . The method of claim 11 , further comprising: determining whether the program code is an asynchronous compute program; determining an asynchronous compute throttle limit configured for the processing resource; and dispatching the hardware thread of the program code in response to a determination that the asynchronous compute throttle limit has not been reached. 15 . The method of claim 14 , further comprising, in response to the determination that VRT is enabled for the program code: scaling the asynchronous compute throttle limit based on the register configuration; and stalling dispatch of the hardware thread to the processing resource based on a scaled asynchronous compute throttle limit. 16 . A graphics processing system comprising: a memory interface; a processing cluster coupled with the memory interface, the processing cluster including a plurality of graphics cores coupled via a data interconnect; and circuitry to dispatch workloads for execution by processing resources within a graphics core of the plurality of graphics cores, the circuitry configured to: receive a request to dispatch program code for execution, the program code associated with a register configuration selected from a plurality of register configurations; select a processing resource of a plurality of processing resources within the graphics core, the processing resource determined to have sufficient available resources to satisfy the register configuration for the program code; assign a number of registers to a hardware thread of the processing resource based on the register configuration selected for the program code; and execute an instruction of the program code via the hardware thread. 17 . The graphics processing system of claim 16 , wherein each of the plurality of register configurations specify a number of registers to assign to the hardware thread of the processing resource. 18 . The graphics processing system of claim 17 , wherein the number of registers to assign to the hardware thread of the processing resource include a first number of general-purpose registers to assign to the hardware thread of the processing resource and a second number of accumulator registers to assign to the hardware thread of the processing resource. 19 . The graphics processing system of claim 18 , wherein the register configuration is selected for the program code based on a shader type associated with the program code. 20 . The graphics processing system of claim 16 , wherein the circuitry is configured to track a number of registers of the processing resource that are allocated to active hardware threads within the processing resource.

Assignees

Inventors

Classifications

  • General purpose rendering architectures · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • the resource being the memory · CPC title

  • Register stacks; shift registers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025147762A1 cover?
Described herein is a graphics processor having processing resources with configurable thread and register configurations. Program code can configure a number of registers and accumulators that will be used by hardware threads during execution of the program code by the graphics processor. Processing resources within the graphics processor can be configured to assign different numbers of regist…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30134. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).