Data locality enhancement for graphics processing units

US12190118B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12190118-B2
Application numberUS-202318339454-A
CountryUS
Kind codeB2
Filing dateJun 22, 2023
Priority dateNov 15, 2019
Publication dateJan 7, 2025
Grant dateJan 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: processor circuitry coupled to a memory, the processor circuitry to: map one or more tasks to one or more processing resources; and forward one or more destination identifiers corresponding to the one or more tasks to the one or more processing resources, wherein the one or more tasks are represented in a task graph, and receive data dependencies associated with the one or more tasks including one or more producer tasks or one or more consumer tasks. 2. The apparatus of claim 1 , wherein the one or more processing resources comprise one or more of a first processing resource or a second processing resource. 3. The apparatus of claim 1 , wherein the one or more producer tasks execute on the first processing resource, and wherein the one or more consumer tasks execute on the second processing resource. 4. The apparatus of claim 3 , wherein the processor circuitry is further to transport a data output from the one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. 5. The apparatus of claim 3 , wherein the processing circuitry is further to enqueue a kernel for execution by the one of the processing resources, wherein the cache memory comprises a L1 cache, and wherein the L1 cache is shared between the first and second processing resources. 6. The apparatus of claim 1 , wherein the processor circuitry comprises graphics processor circuitry co-located with application processor circuitry on a semiconductor package. 7. A method comprising: mapping, by a processor of a computing device, one or more tasks to one or more processing resources; and forwarding one or more destination identifiers corresponding to the one or more tasks to the one or more processing resources, wherein the one or more tasks are represented in a task graph, and receiving data dependencies associated with the one or more tasks including one or more producer tasks or one or more consumer tasks. 8. The method of claim 7 , wherein the one or more processing resources comprise one or more of a first processing resource or a second processing resource. 9. The method of claim 7 , wherein the one or more producer tasks execute on the first processing resource, and wherein the one or more consumer tasks execute on the second processing resource. 10. The method of claim 9 , further comprising transporting a data output from the one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. 11. The method of claim 9 , further comprising enqueuing a kernel for execution by one of the processing resources, wherein the cache memory comprises a L1 cache, and wherein the L1 cache is shared between the first and second processing resources. 12. The method of claim 7 , wherein the processor is coupled to a memory, the processor comprises a graphics processor co-located with an application processor on a semiconductor package. 13. At least one computer-readable medium having stored thereon instructions which, when executed, cause a computing device to facilitate operations comprising: mapping one or more tasks to one or more processing resources; and forwarding one or more destination identifiers corresponding to the one or more tasks to the one or more processing resources, wherein the one or more tasks are represented in a task graph, and receiving data dependencies associated with the one or more tasks including one or more producer tasks or one or more consumer tasks. 14. The computer-readable medium of claim 13 , wherein the one or more processing resources comprise one or more of a first processing resource or a second processing resource. 15. The computer-readable medium of claim 13 , wherein the one or more producer tasks execute on the first processing resource, and wherein the one or more consumer tasks execute on the second processing resource. 16. The computer-readable medium of claim 15 , wherein the operations further comprise transporting a data output from the one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. 17. The computer-readable medium of claim 15 , wherein the operations further comprise enqueuing a kernel for execution by one of the processing resources, wherein the cache memory comprises a L1 cache, and wherein the L1 cache is shared between the first and second processing resources. 18. The computer-readable medium of claim 13 , wherein the computing device comprises one or more processors coupled to a memory, the one or more processors include one or more graphics processors co-located with one or more application processors on a semiconductor package.

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12190118B2 cover?
Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the firs…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3891. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).