Data prefetching for graphics data processing

US11409658B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11409658-B2
Application numberUS-202117161465-A
CountryUS
Kind codeB2
Filing dateJan 28, 2021
Priority dateMar 15, 2019
Publication dateAug 9, 2022
Grant dateAug 9, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least a lower level cache and a higher level cache; and wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including: measuring a hit rate for the lower level cache over a sampling period, comparing the hit rate for the lower level cache to a threshold value number of hits; upon determining that the hit rate for the lower level cache is equal to or greater than the threshold value, limiting a prefetch of data to storage in the higher level cache, and upon determining that the hit rate for the lower level cache is less than the threshold value, allowing the prefetch of data to both the lower level cache and the higher level cache; and wherein, upon a compute operation operating out of the higher level cache, the apparatus is further to utilize a memory link during the operation of the higher level cache to maintain activity of memory bandwidth. 2. The apparatus of claim 1 , wherein the one or more processors are further to determine higher level cache and memory activity at least in part utilizing the memory bandwidth. 3. The apparatus of claim 2 , wherein the one or more processors are further to trigger prefetching and memory scrubbing activities based at least in part on the determined higher level cache and memory activity. 4. The apparatus of claim 1 , wherein the apparatus further includes an interface to receive prefetch instructions from prefetchers of the one or more GPUs, and wherein the apparatus is to detect and eliminate unnecessary prefetches, including: upon the apparatus detecting two or more prefetches having a duplicate address, the apparatus is to eliminate one or more of the prefetches having the duplicate address; or upon the apparatus detecting a prefetch that relates to data that is uncacheable, the apparatus is to eliminate the prefetch. 5. The apparatus of claim 1 , further comprising an execution unit of the one or more GPUs, the execution unit including a hardware preprocessor, the hardware preprocessor to have access to a table of IP addresses that a kernel is using, wherein the hardware preprocessor is to commence prefetching of IP addresses from the table of IP addresses ahead of execution of a thread. 6. The apparatus of claim 1 , wherein a prefetcher of a GPU of the one or more GPUs is to prefetch an instruction directly into an instruction cache (I-cache), and wherein the prefetch of the instruction directly into the I-cache is to occur upon an application driver being aware of a next kernel, and the prefetch being issued for the next kernel when starting execution of a current kernel. 7. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: measuring a hit rate for an lower level cache over a sampling period for a first graphics processing unit (GPU) of one or more GPUs of a computing system, the computing system further including a higher level cache; receiving a prefetch of data for the first GPU; comparing the hit rate for the lower level cache to a threshold value number of hits; upon determining that the hit rate for the lower level cache is equal to or greater than the threshold value, limiting the prefetch of the data to storage in the higher level cache; upon determining that the hit rate for the lower level cache is less than the threshold value, allowing the prefetch of the data to both the lower level cache and the higher level cache; and upon a compute operation operating out of the higher level cache, utilizing a memory link during the operation of the higher level cache to maintain activity of memory bandwidth. 8. The one or more computer-readable storage mediums of claim 7 , further comprising instructions for determining higher level cache and memory activity at least in part utilizing the memory bandwidth. 9. The one or more computer-readable storage mediums of claim 8 , further comprising instructions for triggering prefetching and memory scrubbing activities based at least in part on the determined higher level cache and memory activity. 10. The one or more computer-readable storage mediums of claim 7 , further comprising instructions for detecting and eliminating unnecessary prefetches, including: upon detecting two or more prefetches having a duplicate address, eliminating one or more of the prefetches having the duplicate address; or upon detecting a prefetch that relates to data that is uncacheable, eliminating the prefetch. 11. The one or more computer-readable storage mediums of claim 7 , further comprising instructions for commencing prefetching of IP addresses ahead of execution of a thread from a table of IP addresses that a kernel is using, wherein an execution unit of the one or more GPUs includes a hardware preprocessor, the hardware preprocessor having access to the table of IP addresses. 12. The one or more computer-readable storage mediums of claim 7 , further comprising instructions for prefetching an instruction directly into an instruction cache (I-cache), wherein the prefetch of the instruction directly into the I-cache is to occur upon an application driver being aware of a next kernel, and wherein the prefetch being issued for the next kernel when starting execution of a current kernel. 13. A method comprising: measuring a hit rate for an lower level cache over a sampling period for a first graphics processing unit (GPU) of one or more GPUs of a computing system, the computing system further including a higher level cache; receiving a prefetch of data for the first GPU; comparing the hit rate for the lower level cache to a threshold value number of hits; upon determining that the hit rate for the lower level cache is equal to or greater than the threshold value, limiting the prefetch of the data to storage in the higher level cache; upon determining that the hit rate for the lower level cache is less than the threshold value, allowing the prefetch of the data to both the lower level cache and the higher level cache; and upon a compute operation operating out of the higher level cache, utilizing a memory link during the operation of the higher level cache to maintain activity of memory bandwidth. 14. The method of claim 13 , further comprising determining higher level cache and memory activity at least in part utilizing the memory bandwidth. 15. The method of claim 14 , further comprising triggering prefetching and memory scrubbing activities based at least in part on the determined higher level cache and memory activity. 16. The method of claim 13 , further comprising detecting and eliminating unnecessary prefetches, including: upon detecting two or more prefetches having a duplicate address, eliminating one or more of the prefetches having the duplicate address; or upon detecting a prefetch that relates to data that is uncacheable, eliminating the prefetch. 17. The method of claim 13 , further comprising commencing prefetching of IP addresses ahead of execution of a thread from a table of IP addresses that a kernel is using, wherein an execution unit of the one or more GPUs includes a hardware preprocessor, the hardware preprocessor having access to the table of IP addresses. 18

Assignees

Inventors

Classifications

  • with two or more cache hierarchy levels (with multilevel cache hierarchies G06F12/0811) · CPC title

  • using selective caching, e.g. bypass · CPC title

  • Details relating to cache prefetching · CPC title

  • Details relating to cache mapping · CPC title

  • with prefetch · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11409658B2 cover?
Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefet…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0862. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).