Systems and methods for improving cache efficiency and utilization

US11620256B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620256-B2
Application numberUS-202217732308-A
CountryUS
Kind codeB2
Filing dateApr 28, 2022
Priority dateMar 15, 2019
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit (GPU) comprising: a plurality of groups of cores, each group of cores including: a plurality of cores of a first type; and a plurality of cores of a second type, wherein the plurality of cores of the second type are tensor cores; a plurality of combined level 1 (L1) cache and shared memory units, each corresponding to a different group of cores of the plurality of groups of cores; a level 2 (L2) cache to be shared by the plurality of groups of cores; a plurality of memory controllers to couple the GPU to a memory; and cache controller circuitry associated with the L2 cache in response to a load instruction from a first core of the plurality of groups of cores, to: select a cache control, based on the load instruction, out of a plurality of stored cache controls, wherein at least some of the cache controls have different cache eviction priorities; and apply the selected cache control to data allocated into the L2 cache. 2. The GPU of claim 1 , wherein the plurality of cache controls are to be stored in a data structure. 3. The GPU of claim 1 , wherein the selected cache control has a streaming cache eviction priority. 4. The GPU of claim 1 , wherein the selected cache control is a streaming cache control. 5. The GPU of claim 1 , wherein the load instruction is to indicate the data is for a global address space. 6. The GPU of claim 1 , further comprising: scheduler/dispatcher circuitry to schedule and dispatch graphics threads for execution on the plurality of groups of cores; and a plurality of groups of texture units, each corresponding to a different group of cores of the plurality of groups of cores. 7. The GPU of claim 6 , further comprising input/output (I/O) circuitry to couple the GPU to one or more I/O devices. 8. The GPU of claim 1 , wherein each group of cores of the plurality of groups of cores includes a ray tracing core. 9. A method, performed by a graphics processing unit (GPU), the method comprising: processing data with a plurality of groups of cores, including: processing graphics data with a plurality of cores of a first type in each of the groups; performing matrix operations with a plurality of cores of a second type in each of the groups, wherein the plurality of cores of the second type are tensor cores; and storing data in a plurality of combined level 1 (L1) cache and shared memory units, each corresponding to a different group of cores of the plurality of groups of cores; sharing a level 2 (L2) cache by the plurality of groups of cores; accessing data from a memory by a plurality of memory controllers of the GPU; and performing a load instruction received from a first core of the plurality of groups of cores, including: selecting a cache control, based on the load instruction, out of a plurality of stored cache controls, wherein at least some of the cache controls have different cache eviction priorities; and applying the selected cache control to data allocated into the L2 cache. 10. The method of claim 9 , wherein the plurality of cache controls are stored in a data structure. 11. The method of claim 9 , wherein selecting the cache control comprises selecting a cache control having a streaming cache eviction priority. 12. The method of claim 9 , wherein selecting the cache control comprises selecting a streaming cache control. 13. The method of claim 9 , further comprising scheduling and dispatching graphics threads for execution on the plurality of groups of cores. 14. The method of claim 9 , further comprising performing ray tracing with a ray tracing core in each of the groups of cores. 15. A system comprising: a memory; and a graphics processing unit (GPU) coupled with the memory, the GPU comprising: a plurality of groups of cores, each group of cores including: a plurality of cores of a first type; and a plurality of cores of a second type, wherein the plurality of cores of the second type are tensor cores; a plurality of combined level 1 (L1) cache and shared memory units, each corresponding to a different group of cores of the plurality of groups of cores; a level 2 (L2) cache to be shared by the plurality of groups of cores; a plurality of memory controllers to couple the GPU to a memory; and cache controller circuitry associated with the L2 cache in response to a load instruction from a first core of the plurality of groups of cores, to: select a cache control, based on the load instruction, out of a plurality of stored cache controls, wherein at least some of the cache controls have different cache eviction priorities; and apply the selected cache control to data allocated into the L2 cache. 16. The system of claim 15 , further comprising a data storage device coupled with the GPU and the memory, and wherein the plurality of cache controls are to be stored in a data structure. 17. The system of claim 15 , further comprising a network controller coupled with the memory, and wherein the selected cache control has a streaming cache eviction priority. 18. The system of claim 15 , further comprising a touch sensor coupled with the GPU, and wherein the selected cache control is a streaming cache control. 19. The system of claim 15 , further comprising a data storage device coupled with the GPU and the memory, and wherein the load instruction is to indicate the data is for a global address space. 20. The system of claim 15 , further comprising a network controller coupled with the memory, and wherein the GPU further comprises: scheduler/dispatcher circuitry to schedule and dispatch graphics threads for execution on the plurality of groups of cores; and a plurality of groups of texture units, each corresponding to a different group of cores of the plurality of groups of cores. 21. The system of claim 20 , further comprising at least one I/O device, and wherein the GPU further comprises input/output (I/O) circuitry to couple the GPU to the at least one I/O device. 22. The system of claim 15 , wherein each group of cores of the plurality of groups of cores includes a ray tracing core.

Assignees

Inventors

Classifications

  • Page size control · CPC title

  • Details relating to cache mapping · CPC title

  • Prefetching based on hints or prefetch instructions · CPC title

  • Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title

  • Reconfiguration of cache memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620256B2 cover?
Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache opera…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).