Graphics processor register file including a low energy portion and a high capacity portion

US12579072B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12579072-B2
Application numberUS-202418405933-A
CountryUS
Kind codeB2
Filing dateJan 5, 2024
Priority dateApr 1, 2017
Publication dateMar 17, 2026
Grant dateMar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.

First claim

Opening claim text (preview).

What is claimed is: 1 . A graphics processor comprising: a memory interface coupled with a memory bus; and processing resource coupled with the memory interface, the processing resource comprising: a plurality of single-instruction, multiple-thread (SIMT) processing lanes, including a first plurality of SIMT lanes associated with a low energy arithmetic logic unit having a first operational voltage and a second plurality of SIMT lanes associated with a high capacity arithmetic logic unit having a second operational voltage that is higher than the first operational voltage; and a register file including a low energy portion and a high capacity portion, the processing resource configured to allocate a first number of register entries to the low energy portion and a second number of register entries to the high capacity portion for each thread, the low energy portion having a lower energy consumption relative to the high capacity portion and the high capacity portion having a larger number of registers relative to the low energy portion. 2 . The graphics processor of claim 1 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime. 3 . The graphics processor of claim 2 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime based on an adjustment of a compile-time register allocation. 4 . The graphics processor of claim 3 , wherein the first number of register entries is less than or equal to the second number of register entries for a first plurality of threads and the first number of register entries is greater than or equal to the second number of register entries for a second plurality of threads. 5 . The graphics processor of claim 4 , wherein the first number of register entries and the second number of register entries are based on percentages of register addresses mapped for each of the first plurality of threads and the second plurality of threads. 6 . The graphics processor of claim 5 , wherein the processing resource is configured to map all register addresses for the first plurality of threads to the low energy portion and all register addresses for the second plurality of threads to the high capacity portion. 7 . The graphics processor of claim 1 , wherein the processing resource is configured to use a single logical namespace for the low energy portion and the high capacity portion. 8 . The graphics processor of claim 7 , wherein the processing resource is further configured to dynamically adjust the first number of register entries and the second number of register entries based on runtime information. 9 . The graphics processor of claim 8 , wherein the runtime information includes a number of currently executing thread groups. 10 . The graphics processor of claim 1 , wherein the first plurality of SIMT lanes is associated with a low energy texture unit and the second plurality of SIMT lanes is associated with a high capacity texture unit. 11 . A non-transitory machine-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, causes the one or more processors to perform operations comprising: compiling program code for execution by a graphics processor having a plurality of single-instruction, multiple-thread (SIMT) processing lanes including a first plurality of SIMT lanes associated with a low energy arithmetic logic unit having a first operational voltage and a second plurality of SIMT lanes associated with a high capacity arithmetic logic unit having a second operational voltage that is higher than the first operational voltage, and a register file having a low energy portion and a high capacity portion, the low energy portion having a lower energy consumption relative to the high capacity portion and the high capacity portion having a larger number of registers relative to the low energy portion; and during compilation of the program code, allocating register live ranges to register addresses via a register allocation mechanism, the register live ranges to enable runtime determination of a number of entries per thread to allocate to the low energy portion of the register file and the high capacity portion of the register file. 12 . The non-transitory machine-readable medium of claim 11 , wherein the register allocation mechanism is configured to reduce a number of registers used per thread. 13 . The non-transitory machine-readable medium of claim 11 , further comprising allocating multiple non-overlapping live ranges to a register address. 14 . A graphics processing system comprising: a memory device; and a graphics processor coupled with the memory device via a memory bus, the graphics processor comprising a processing resource including: a plurality of single-instruction, multiple-thread (SIMT) processing lanes, a first plurality of SIMT lanes associated with a low energy arithmetic logic unit having a first operational voltage and a low energy texture unit having a second operational voltage and a second plurality of SIMT lanes is associated with a high capacity arithmetic logic unit having a third operational voltage that is higher than the first operational voltage and a high capacity texture unit having a fourth operational voltage that is higher than the second operational voltage; and a register file including a low energy portion and a high capacity portion, the processing resource configured to allocate a first number of register entries to the low energy portion and a second number of register entries to the high capacity portion for each thread, the low energy portion having a lower energy consumption relative to the high capacity portion and the high capacity portion having a larger number of registers relative to the low energy portion. 15 . The graphics processing system of claim 14 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime. 16 . The graphics processing system of claim 15 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime based on an adjustment of a compile-time register allocation. 17 . The graphics processing system of claim 16 , wherein the first number of register entries is less than or equal to the second number of register entries for a first plurality of threads and the first number of register entries is greater than or equal to the second number of register entries for a second plurality of threads. 18 . The graphics processing system of claim 17 , wherein the first number of register entries and the second number of register entries are based on percentages of register addresses mapped for each of the first plurality of threads and the second plurality of threads. 19 . The graphics processing system of claim 18 , wherein the processing resource is configured to map all register addresses for the first plurality of threads to the low energy portion and all register addresses for the second plurality of threads to the high capacity portion. 20 . The graphics processing system of claim 14 , wherein the processing resource is configured to use a single logical namespace for the low energy portion and the high capacity portion and is further configured to dynamically adjust the first number of register

Assignees

Inventors

Classifications

  • with special data handling, e.g. priority of data or instructions, handling errors or pinning · CPC title

  • Memory management · CPC title

  • Data transfer between cache memory and other subsystems, e.g. storage devices or host systems · CPC title

  • Partitioned cache, e.g. separate instruction and operand caches · CPC title

  • Multiuser, multiprocessor or multiprocessing cache systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12579072B2 cover?
One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memor…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0877. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).