Sparse convolutional neural network accelerator
US-10891538-B2 · Jan 12, 2021 · US
US12579072B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12579072-B2 |
| Application number | US-202418405933-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 5, 2024 |
| Priority date | Apr 1, 2017 |
| Publication date | Mar 17, 2026 |
| Grant date | Mar 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
Opening claim text (preview).
What is claimed is: 1 . A graphics processor comprising: a memory interface coupled with a memory bus; and processing resource coupled with the memory interface, the processing resource comprising: a plurality of single-instruction, multiple-thread (SIMT) processing lanes, including a first plurality of SIMT lanes associated with a low energy arithmetic logic unit having a first operational voltage and a second plurality of SIMT lanes associated with a high capacity arithmetic logic unit having a second operational voltage that is higher than the first operational voltage; and a register file including a low energy portion and a high capacity portion, the processing resource configured to allocate a first number of register entries to the low energy portion and a second number of register entries to the high capacity portion for each thread, the low energy portion having a lower energy consumption relative to the high capacity portion and the high capacity portion having a larger number of registers relative to the low energy portion. 2 . The graphics processor of claim 1 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime. 3 . The graphics processor of claim 2 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime based on an adjustment of a compile-time register allocation. 4 . The graphics processor of claim 3 , wherein the first number of register entries is less than or equal to the second number of register entries for a first plurality of threads and the first number of register entries is greater than or equal to the second number of register entries for a second plurality of threads. 5 . The graphics processor of claim 4 , wherein the first number of register entries and the second number of register entries are based on percentages of register addresses mapped for each of the first plurality of threads and the second plurality of threads. 6 . The graphics processor of claim 5 , wherein the processing resource is configured to map all register addresses for the first plurality of threads to the low energy portion and all register addresses for the second plurality of threads to the high capacity portion. 7 . The graphics processor of claim 1 , wherein the processing resource is configured to use a single logical namespace for the low energy portion and the high capacity portion. 8 . The graphics processor of claim 7 , wherein the processing resource is further configured to dynamically adjust the first number of register entries and the second number of register entries based on runtime information. 9 . The graphics processor of claim 8 , wherein the runtime information includes a number of currently executing thread groups. 10 . The graphics processor of claim 1 , wherein the first plurality of SIMT lanes is associated with a low energy texture unit and the second plurality of SIMT lanes is associated with a high capacity texture unit. 11 . A non-transitory machine-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, causes the one or more processors to perform operations comprising: compiling program code for execution by a graphics processor having a plurality of single-instruction, multiple-thread (SIMT) processing lanes including a first plurality of SIMT lanes associated with a low energy arithmetic logic unit having a first operational voltage and a second plurality of SIMT lanes associated with a high capacity arithmetic logic unit having a second operational voltage that is higher than the first operational voltage, and a register file having a low energy portion and a high capacity portion, the low energy portion having a lower energy consumption relative to the high capacity portion and the high capacity portion having a larger number of registers relative to the low energy portion; and during compilation of the program code, allocating register live ranges to register addresses via a register allocation mechanism, the register live ranges to enable runtime determination of a number of entries per thread to allocate to the low energy portion of the register file and the high capacity portion of the register file. 12 . The non-transitory machine-readable medium of claim 11 , wherein the register allocation mechanism is configured to reduce a number of registers used per thread. 13 . The non-transitory machine-readable medium of claim 11 , further comprising allocating multiple non-overlapping live ranges to a register address. 14 . A graphics processing system comprising: a memory device; and a graphics processor coupled with the memory device via a memory bus, the graphics processor comprising a processing resource including: a plurality of single-instruction, multiple-thread (SIMT) processing lanes, a first plurality of SIMT lanes associated with a low energy arithmetic logic unit having a first operational voltage and a low energy texture unit having a second operational voltage and a second plurality of SIMT lanes is associated with a high capacity arithmetic logic unit having a third operational voltage that is higher than the first operational voltage and a high capacity texture unit having a fourth operational voltage that is higher than the second operational voltage; and a register file including a low energy portion and a high capacity portion, the processing resource configured to allocate a first number of register entries to the low energy portion and a second number of register entries to the high capacity portion for each thread, the low energy portion having a lower energy consumption relative to the high capacity portion and the high capacity portion having a larger number of registers relative to the low energy portion. 15 . The graphics processing system of claim 14 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime. 16 . The graphics processing system of claim 15 , wherein the processing resource is configured to determine the first number of register entries and the second number of register entries at runtime based on an adjustment of a compile-time register allocation. 17 . The graphics processing system of claim 16 , wherein the first number of register entries is less than or equal to the second number of register entries for a first plurality of threads and the first number of register entries is greater than or equal to the second number of register entries for a second plurality of threads. 18 . The graphics processing system of claim 17 , wherein the first number of register entries and the second number of register entries are based on percentages of register addresses mapped for each of the first plurality of threads and the second plurality of threads. 19 . The graphics processing system of claim 18 , wherein the processing resource is configured to map all register addresses for the first plurality of threads to the low energy portion and all register addresses for the second plurality of threads to the high capacity portion. 20 . The graphics processing system of claim 14 , wherein the processing resource is configured to use a single logical namespace for the low energy portion and the high capacity portion and is further configured to dynamically adjust the first number of register
with special data handling, e.g. priority of data or instructions, handling errors or pinning · CPC title
Memory management · CPC title
Data transfer between cache memory and other subsystems, e.g. storage devices or host systems · CPC title
Partitioned cache, e.g. separate instruction and operand caches · CPC title
Multiuser, multiprocessor or multiprocessing cache systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.