Graphics processor data access and sharing

US12204487B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12204487-B2
Application numberUS-202418415052-A
CountryUS
Kind codeB2
Filing dateJan 17, 2024
Priority dateMar 15, 2019
Publication dateJan 21, 2025
Grant dateJan 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a plurality of processors, including a plurality of graphics processing units (GPUs); and a shared memory accessible to each of the plurality of GPUs, wherein the shared memory includes at least a first memory associated with a first GPU of the plurality of GPUs and a second memory associated with a second GPU of the plurality of GPUs; wherein the plurality of processors are to: identify data usages for the plurality of GPUs, determine a preferred data access structure for access by the plurality of GPUs based at least in part on the identified data usages, and establish a masking structure to implement the preferred data access structure for access by the plurality of GPUs. 2. The apparatus of claim 1 , wherein the first memory is a physically closest memory of the shared memory to the first GPU and the second memory is a physically closest memory of the shared memory to the second GPU. 3. The apparatus of claim 1 , wherein, in response to requests for access to a first page in the shared memory from the first GPU and the second GPU, the masking structure directs that: the first GPU is to access a first copy of the first page from the first memory; and the second GPU is to access a second copy of the first page from the second memory. 4. The apparatus of claim 3 , wherein the first copy and the second copy of the page are in a non-modified state. 5. The apparatus of claim 1 , wherein the masking structure is provided at an instruction level for the apparatus. 6. The apparatus of claim 1 , wherein the masking structure implements simultaneous exclusive ownership of one or more pages across the plurality of GPUs. 7. A method comprising: identifying data usages for a plurality of graphics processing units (GPUs) in a computing system, the computing system including a shared memory accessible to each of the plurality of GPUs, wherein the shared memory includes at least a first memory associated with a first GPU of the plurality of GPUs and a second memory associated with a second GPU of the plurality of GPUs; determining a preferred data access structure for access by the plurality of GPUs based at least in part on the identified data usages; and establishing a masking structure to implement the preferred data access structure for access by the plurality of GPUs. 8. The method of claim 7 , wherein first memory is a physically closest memory of the shared memory to the first GPU and the second memory is a physically closest memory of the shared memory to the second GPU. 9. The method of claim 7 , further comprising: receiving a first request from the first GPU for access to a first page in the shared memory; in response to the first request, directing the first GPU to access a first copy of the first page from the first memory; receiving a second request from the second GPU for access to the first page in the shared memory; and in response to the second request, directing the second GPU to access a second copy of the first page from the second memory. 10. The method of claim 9 , wherein the first copy and the second copy of the first page are in a non-modified state. 11. The method of claim 7 , wherein establishing the masking structure includes providing the masking structure at an instruction level for the computing system. 12. The method of claim 7 , wherein establishing the masking structure implements simultaneous exclusive ownership of one or more pages across the plurality of GPUs. 13. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: identifying data usages for a plurality of graphics processing units (GPUs) in a computing system, the computing system including a shared memory accessible to each of the plurality of GPUs, wherein the shared memory includes at least a first memory associated with a first GPU of the plurality of GPUs and a second memory associated with a second GPU of the plurality of GPUs; determining a preferred data access structure for access by the plurality of GPUs based at least in part on the identified data usages; and establishing a masking structure to implement the preferred data access structure for access by the plurality of GPUs. 14. The one or more computer-readable storage mediums of claim 13 , wherein first memory is a physically closest memory of the shared memory to the first GPU and the second memory is a physically closest memory of the shared memory to the second GPU. 15. The one or more computer-readable storage mediums of claim 13 , further comprising instructions for: receiving a first request from the first GPU for access to a first page in the shared memory; in response to the first request, directing the first GPU to access a first copy of the first page from the first memory; receiving a second request from the second GPU for access to the first page in the shared memory; and in response to the second request, directing the second GPU to access a second copy of the first page from the second memory. 16. The one or more computer-readable storage mediums of claim 15 , wherein the first copy and the second copy of the first page are in a non-modified state. 17. The one or more computer-readable storage mediums of claim 13 , wherein establishing the masking structure includes providing the masking structure at an instruction level for the computing system. 18. The one or more computer-readable storage mediums of claim 13 , wherein establishing the masking structure implements simultaneous exclusive ownership of one or more pages across the plurality of GPUs.

Assignees

Inventors

Classifications

  • Page size control · CPC title

  • Details relating to cache mapping · CPC title

  • Prefetching based on hints or prefetch instructions · CPC title

  • Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title

  • Reconfiguration of cache memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12204487B2 cover?
Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch i…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0862. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).