Memory prefetching in multiple GPU environment

US11861759B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11861759-B2
Application numberUS-202217580352-A
CountryUS
Kind codeB2
Filing dateJan 20, 2022
Priority dateMar 15, 2019
Publication dateJan 2, 2024
Grant dateJan 2, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a plurality of processors including a host processor and a plurality of graphics processing units (GPUs) to process data including at least a first graphics processing unit (GPU), each of the plurality of GPUs including a prefetcher and one or more caches; and a memory for storage of data; wherein the prefetcher of each of the plurality of GPUs is to prefetch data from the memory to a cache of the respective GPU; wherein a prefetch operation by the first GPU includes the prefetcher of the first GPU issuing a gather/scatter prefetch message including a plurality of prefetch addresses; and wherein the plurality of processors are to parse the gather/scatter prefetch message and issue a prefetch message for each of the plurality of prefetch addresses. 2. The apparatus of claim 1 , wherein the gather/scatter prefetch message includes an entry for each of the plurality of prefetch addresses, the entry to indicate a cache level for prefetching. 3. The apparatus of claim 2 , wherein the gather/scatter prefetch message includes a plurality of different cache levels within the gather/scatter prefetch message. 4. The apparatus of claim 1 , wherein the plurality of prefetch addresses includes noncontiguous addresses. 5. The apparatus of claim 1 , wherein the prefetcher of the first GPU is to send a notification to a thread in a core of the first GPU when a prefetch for the thread is complete. 6. The apparatus of claim 1 , wherein the prefetchers of the plurality of GPUs are to prefetch data from the memory to a cache of each respective GPU in execution of a multi-GPU workload. 7. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: processing a workload in a computing system including a plurality of processors, the plurality of processors including a host processor and a plurality of graphics processing units (GPUs), each of the plurality of GPUs including a prefetcher and one or more caches; prefetching data by a first graphics processing unit (GPU) of the plurality of GPUs from a memory of the computing system to a cache of the first GPU wherein prefetching data by the first GPU includes the prefetcher of the first GPU issuing a gather/scatter prefetch message including a plurality of prefetch addresses; and parsing the gather/scatter prefetch message and issuing a prefetch message for each of the plurality of prefetch addresses. 8. The one or more computer-readable storage mediums of claim 7 , wherein the gather/scatter prefetch message includes an entry for each of the plurality of prefetch addresses, the entry to indicate a cache level for prefetching. 9. The one or more computer-readable storage mediums of claim 8 , wherein the gather/scatter prefetch message includes a plurality of different cache levels within the gather/scatter prefetch message. 10. The one or more computer-readable storage mediums of claim 7 , wherein the plurality of prefetch addresses includes noncontiguous addresses. 11. The one or more computer-readable storage mediums of claim 7 , wherein the instructions further include instructions for: sending a notification to a thread in a core of the first GPU when a prefetch for the thread is complete. 12. The one or more computer-readable storage mediums of claim 7 , wherein the workload is a multi-GPU workload, and wherein the prefetchers of the plurality of GPUs are to prefetch data from the memory to a cache of each respective GPU in execution of the multi-GPU workload. 13. A method comprising: processing a workload in a computing system including a plurality of processors, the plurality of processors including a host processor and a plurality of graphics processing units (GPUs), each of the plurality of GPUs including a prefetcher and one or more caches; prefetching data by a first graphics processing unit (GPU) of the plurality of GPUs from a memory of the computing system to a cache of the first GPU wherein prefetching data by the first GPU includes the prefetcher of the first GPU issuing a gather/scatter prefetch message including a plurality of prefetch addresses; and parsing the gather/scatter prefetch message and issuing a prefetch message for each of the plurality of prefetch addresses. 14. The method of claim 13 , wherein the gather/scatter prefetch message includes an entry for each of the plurality of prefetch addresses, the entry to indicate a cache level for prefetching. 15. The method of claim 14 , wherein the gather/scatter prefetch message includes a plurality of different cache levels within the gather/scatter prefetch message. 16. The method of claim 13 , wherein the plurality of prefetch addresses includes noncontiguous addresses. 17. The method of claim 13 , further comprising: sending a notification to a thread in a core of the first GPU when a prefetch for the thread is complete. 18. The method of claim 13 , wherein the workload is a multi-GPU workload, and wherein the prefetchers of the plurality of GPUs are to prefetch data from the memory to a cache of each respective GPU in execution of the multi-GPU workload.

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Instruction prefetching · CPC title

  • using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title

  • Memory management · CPC title

  • General purpose rendering architectures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11861759B2 cover?
Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 02 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).