Gpu-assisted lossless data compression
US-2017214930-A1 · Jul 27, 2017 · US
US11620256B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620256-B2 |
| Application number | US-202217732308-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2022 |
| Priority date | Mar 15, 2019 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
Opening claim text (preview).
What is claimed is: 1. A graphics processing unit (GPU) comprising: a plurality of groups of cores, each group of cores including: a plurality of cores of a first type; and a plurality of cores of a second type, wherein the plurality of cores of the second type are tensor cores; a plurality of combined level 1 (L1) cache and shared memory units, each corresponding to a different group of cores of the plurality of groups of cores; a level 2 (L2) cache to be shared by the plurality of groups of cores; a plurality of memory controllers to couple the GPU to a memory; and cache controller circuitry associated with the L2 cache in response to a load instruction from a first core of the plurality of groups of cores, to: select a cache control, based on the load instruction, out of a plurality of stored cache controls, wherein at least some of the cache controls have different cache eviction priorities; and apply the selected cache control to data allocated into the L2 cache. 2. The GPU of claim 1 , wherein the plurality of cache controls are to be stored in a data structure. 3. The GPU of claim 1 , wherein the selected cache control has a streaming cache eviction priority. 4. The GPU of claim 1 , wherein the selected cache control is a streaming cache control. 5. The GPU of claim 1 , wherein the load instruction is to indicate the data is for a global address space. 6. The GPU of claim 1 , further comprising: scheduler/dispatcher circuitry to schedule and dispatch graphics threads for execution on the plurality of groups of cores; and a plurality of groups of texture units, each corresponding to a different group of cores of the plurality of groups of cores. 7. The GPU of claim 6 , further comprising input/output (I/O) circuitry to couple the GPU to one or more I/O devices. 8. The GPU of claim 1 , wherein each group of cores of the plurality of groups of cores includes a ray tracing core. 9. A method, performed by a graphics processing unit (GPU), the method comprising: processing data with a plurality of groups of cores, including: processing graphics data with a plurality of cores of a first type in each of the groups; performing matrix operations with a plurality of cores of a second type in each of the groups, wherein the plurality of cores of the second type are tensor cores; and storing data in a plurality of combined level 1 (L1) cache and shared memory units, each corresponding to a different group of cores of the plurality of groups of cores; sharing a level 2 (L2) cache by the plurality of groups of cores; accessing data from a memory by a plurality of memory controllers of the GPU; and performing a load instruction received from a first core of the plurality of groups of cores, including: selecting a cache control, based on the load instruction, out of a plurality of stored cache controls, wherein at least some of the cache controls have different cache eviction priorities; and applying the selected cache control to data allocated into the L2 cache. 10. The method of claim 9 , wherein the plurality of cache controls are stored in a data structure. 11. The method of claim 9 , wherein selecting the cache control comprises selecting a cache control having a streaming cache eviction priority. 12. The method of claim 9 , wherein selecting the cache control comprises selecting a streaming cache control. 13. The method of claim 9 , further comprising scheduling and dispatching graphics threads for execution on the plurality of groups of cores. 14. The method of claim 9 , further comprising performing ray tracing with a ray tracing core in each of the groups of cores. 15. A system comprising: a memory; and a graphics processing unit (GPU) coupled with the memory, the GPU comprising: a plurality of groups of cores, each group of cores including: a plurality of cores of a first type; and a plurality of cores of a second type, wherein the plurality of cores of the second type are tensor cores; a plurality of combined level 1 (L1) cache and shared memory units, each corresponding to a different group of cores of the plurality of groups of cores; a level 2 (L2) cache to be shared by the plurality of groups of cores; a plurality of memory controllers to couple the GPU to a memory; and cache controller circuitry associated with the L2 cache in response to a load instruction from a first core of the plurality of groups of cores, to: select a cache control, based on the load instruction, out of a plurality of stored cache controls, wherein at least some of the cache controls have different cache eviction priorities; and apply the selected cache control to data allocated into the L2 cache. 16. The system of claim 15 , further comprising a data storage device coupled with the GPU and the memory, and wherein the plurality of cache controls are to be stored in a data structure. 17. The system of claim 15 , further comprising a network controller coupled with the memory, and wherein the selected cache control has a streaming cache eviction priority. 18. The system of claim 15 , further comprising a touch sensor coupled with the GPU, and wherein the selected cache control is a streaming cache control. 19. The system of claim 15 , further comprising a data storage device coupled with the GPU and the memory, and wherein the load instruction is to indicate the data is for a global address space. 20. The system of claim 15 , further comprising a network controller coupled with the memory, and wherein the GPU further comprises: scheduler/dispatcher circuitry to schedule and dispatch graphics threads for execution on the plurality of groups of cores; and a plurality of groups of texture units, each corresponding to a different group of cores of the plurality of groups of cores. 21. The system of claim 20 , further comprising at least one I/O device, and wherein the GPU further comprises input/output (I/O) circuitry to couple the GPU to the at least one I/O device. 22. The system of claim 15 , wherein each group of cores of the plurality of groups of cores includes a ray tracing core.
Page size control · CPC title
Details relating to cache mapping · CPC title
Prefetching based on hints or prefetch instructions · CPC title
Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title
Reconfiguration of cache memory · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.