Unified memory systems and methods
US-2019266695-A1 · Aug 29, 2019 · US
US11861759B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11861759-B2 |
| Application number | US-202217580352-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 20, 2022 |
| Priority date | Mar 15, 2019 |
| Publication date | Jan 2, 2024 |
| Grant date | Jan 2, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a plurality of processors including a host processor and a plurality of graphics processing units (GPUs) to process data including at least a first graphics processing unit (GPU), each of the plurality of GPUs including a prefetcher and one or more caches; and a memory for storage of data; wherein the prefetcher of each of the plurality of GPUs is to prefetch data from the memory to a cache of the respective GPU; wherein a prefetch operation by the first GPU includes the prefetcher of the first GPU issuing a gather/scatter prefetch message including a plurality of prefetch addresses; and wherein the plurality of processors are to parse the gather/scatter prefetch message and issue a prefetch message for each of the plurality of prefetch addresses. 2. The apparatus of claim 1 , wherein the gather/scatter prefetch message includes an entry for each of the plurality of prefetch addresses, the entry to indicate a cache level for prefetching. 3. The apparatus of claim 2 , wherein the gather/scatter prefetch message includes a plurality of different cache levels within the gather/scatter prefetch message. 4. The apparatus of claim 1 , wherein the plurality of prefetch addresses includes noncontiguous addresses. 5. The apparatus of claim 1 , wherein the prefetcher of the first GPU is to send a notification to a thread in a core of the first GPU when a prefetch for the thread is complete. 6. The apparatus of claim 1 , wherein the prefetchers of the plurality of GPUs are to prefetch data from the memory to a cache of each respective GPU in execution of a multi-GPU workload. 7. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: processing a workload in a computing system including a plurality of processors, the plurality of processors including a host processor and a plurality of graphics processing units (GPUs), each of the plurality of GPUs including a prefetcher and one or more caches; prefetching data by a first graphics processing unit (GPU) of the plurality of GPUs from a memory of the computing system to a cache of the first GPU wherein prefetching data by the first GPU includes the prefetcher of the first GPU issuing a gather/scatter prefetch message including a plurality of prefetch addresses; and parsing the gather/scatter prefetch message and issuing a prefetch message for each of the plurality of prefetch addresses. 8. The one or more computer-readable storage mediums of claim 7 , wherein the gather/scatter prefetch message includes an entry for each of the plurality of prefetch addresses, the entry to indicate a cache level for prefetching. 9. The one or more computer-readable storage mediums of claim 8 , wherein the gather/scatter prefetch message includes a plurality of different cache levels within the gather/scatter prefetch message. 10. The one or more computer-readable storage mediums of claim 7 , wherein the plurality of prefetch addresses includes noncontiguous addresses. 11. The one or more computer-readable storage mediums of claim 7 , wherein the instructions further include instructions for: sending a notification to a thread in a core of the first GPU when a prefetch for the thread is complete. 12. The one or more computer-readable storage mediums of claim 7 , wherein the workload is a multi-GPU workload, and wherein the prefetchers of the plurality of GPUs are to prefetch data from the memory to a cache of each respective GPU in execution of the multi-GPU workload. 13. A method comprising: processing a workload in a computing system including a plurality of processors, the plurality of processors including a host processor and a plurality of graphics processing units (GPUs), each of the plurality of GPUs including a prefetcher and one or more caches; prefetching data by a first graphics processing unit (GPU) of the plurality of GPUs from a memory of the computing system to a cache of the first GPU wherein prefetching data by the first GPU includes the prefetcher of the first GPU issuing a gather/scatter prefetch message including a plurality of prefetch addresses; and parsing the gather/scatter prefetch message and issuing a prefetch message for each of the plurality of prefetch addresses. 14. The method of claim 13 , wherein the gather/scatter prefetch message includes an entry for each of the plurality of prefetch addresses, the entry to indicate a cache level for prefetching. 15. The method of claim 14 , wherein the gather/scatter prefetch message includes a plurality of different cache levels within the gather/scatter prefetch message. 16. The method of claim 13 , wherein the plurality of prefetch addresses includes noncontiguous addresses. 17. The method of claim 13 , further comprising: sending a notification to a thread in a core of the first GPU when a prefetch for the thread is complete. 18. The method of claim 13 , wherein the workload is a multi-GPU workload, and wherein the prefetchers of the plurality of GPUs are to prefetch data from the memory to a cache of each respective GPU in execution of the multi-GPU workload.
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Instruction prefetching · CPC title
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title
Memory management · CPC title
General purpose rendering architectures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.