Accessing local memory of a GPU executing a first kernel when executing a second kernel of another GPU

US12406324B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12406324-B2
Application numberUS-202318504068-A
CountryUS
Kind codeB2
Filing dateNov 7, 2023
Priority dateApr 28, 2020
Publication dateSep 2, 2025
Grant dateSep 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods for graphics processing are provided. One example method includes executing a plurality of kernels using a plurality of graphics processing units (GPUs), wherein responsibility for executing a corresponding kernel is divided into one or more portions each of which being assigned to a corresponding GPU. The method includes generating a plurality of dependency data at a first kernel as each of a first plurality of portions of the first kernel completes processing. The method includes checking dependency data from one or more portions of the first kernel prior to execution of a portion of a second kernel. The method includes delaying execution of the portion of the second kernel as long as the corresponding dependency data of the first kernel has not been met.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: executing, at a first graphics processing unit (GPU), a portion of a first kernel to completion to generate first kernel data; storing, by the first GPU, the first kernel data in local memory of the first GPU; generating, by the first GPU, first dependency data indicating that the first kernel data is stored in the local memory of the first GPU; storing, by the first GPU, the first dependency data at a first memory location of a dependency data store; checking, by a second GPU, status of the first dependency data at the first memory location of the dependency data store; when the status of the first dependency data indicates that the first kernel data is stored in the local memory of the first GPU, accessing, by the second GPU, the first kernel data from the local memory of the first GPU; and executing a portion of a second kernel using the first kernel data that has been accessed. 2. The method of claim 1 , wherein the accessing the first kernel data from the local memory of the first GPU includes: reading the first kernel data from the local memory of the first GPU when performing the executing the portion of the second kernel. 3. The method of claim 1 , wherein the accessing the first kernel data from the local memory of the first GPU includes: copying the first kernel data from the local memory of the first GPU to local memory of the second GPU; and reading, by the second GPU, the first kernel data from the local memory of the second GPU when performing the executing the portion of the second kernel. 4. The method of claim 3 , comprising: using direct memory access (DMA) when performing the copying the first kernel data from the local memory of the first GPU to the local memory of the second GPU. 5. The method of claim 1 , wherein the accessing the first kernel data from the local memory of the first GPU includes: reading at least a first portion of the first kernel data from the local memory of the first GPU; copying the first kernel data from the local memory of the first GPU to local memory of the second GPU; and executing the portion for the second kernel using the at least the first portion of the first kernel data. 6. The method of claim 5 , further comprising: reading, by the second GPU, a remaining portion of the first kernel data from the local memory of the second GPU after the copying the first kernel data to the local memory of the second GPU has completed; and performing, by the second GPU, the executing the portion of the second kernel using the remaining portion of the first kernel data. 7. The method of claim 1 , further comprising: dividing the first kernel into a plurality of portions during execution of an application, wherein the executing the portion of the second kernel begins before the plurality of portions of the first kernel has completed processing. 8. A computer system comprising: a processor; memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to execute a method, comprising: executing, at a first graphics processing unit (GPU), a portion of a first kernel to completion to generate first kernel data; storing, by the first GPU, the first kernel data in local memory of the first GPU; generating, by the first GPU, first dependency data indicating that the first kernel data is stored in the local memory of the first GPU; storing, by the first GPU, the first dependency data at a first memory location of a dependency data store; checking, by a second GPU, status of the first dependency data at the first memory location of the dependency data store; when the status of the first dependency data indicates that the first kernel data is stored in the local memory of the first GPU, accessing by the second GPU the first kernel data from the local memory of the first GPU; and executing a portion of a second kernel using the first kernel data that has been accessed. 9. The computer system of claim 8 , wherein in the method the accessing the first kernel data from the local memory of the first GPU includes: reading the first kernel data from the local memory of the first GPU when performing the executing the portion of the second kernel. 10. The computer system of claim 8 , wherein in the method the accessing the first kernel data from the local memory of the first GPU includes: copying the first kernel data from the local memory of the first GPU to local memory of the second GPU; and reading, by the second GPU, the first kernel data from the local memory of the second GPU when performing the executing the portion of the second kernel. 11. The computer system of claim 10 , the method further comprising: using direct memory access (DMA) when performing the copying the first kernel data from the local memory of the first GPU to the local memory of the second GPU. 12. The computer system of claim 8 , wherein in the method the accessing the first kernel data from the local memory of the first GPU includes: reading at least a first portion of the first kernel data from the local memory of the first GPU; copying the first kernel data from the local memory of the first GPU to local memory of the second GPU; and executing the portion for the second kernel using the at least the first portion of the first kernel data. 13. The computer system of claim 12 , the method further comprising: reading, by the second GPU, a remaining portion of the first kernel data from the local memory of the second GPU after the copying the first kernel data to the local memory of the second GPU has completed; and performing, by the second GPU, the executing the portion of the second kernel using the remaining portion of the first kernel data. 14. The computer system of claim 8 , the method further comprising: dividing the first kernel into a plurality of portions during execution of an application, wherein the executing the portion of the second kernel begins before the plurality of portions of the first kernel has completed processing. 15. A non-transitory computer-readable medium storing a computer program for execution by a processor to perform a method, the non-transitory computer-readable medium comprising: program instructions for executing, at a first graphics processing unit (GPU), a portion of a first kernel to completion to generate first kernel data; program instructions for storing, by the first GPU, the first kernel data in local memory of the first GPU; program instructions for generating, by the first GPU, first dependency data indicating that the first kernel data is stored in the local memory of the first GPU; program instructions for storing, by the first GPU, the first dependency data at a first memory location of a dependency data store; program instructions for checking, by a second GPU, status of the first dependency data at the first memory location of the dependency data store; program instructions for accessing, by the second GPU, the first kernel data from the local memory of the first GPU when the status of the first dependency data indicates that the first kernel data is stored in the local memory of the first GPU; and program instructions for executing a portion of a second kernel using the first kernel data that has been accessed. 16. The non-transitory computer-readable medium of claim 15 , wherein the program instructions for accessing the first kernel data from the local memory of the first GPU includes: program instructions for reading the first kernel data from the local me

Assignees

Inventors

Classifications

  • General purpose rendering architectures · CPC title

  • Memory management · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • considering the load · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12406324B2 cover?
Methods for graphics processing are provided. One example method includes executing a plurality of kernels using a plurality of graphics processing units (GPUs), wherein responsibility for executing a corresponding kernel is divided into one or more portions each of which being assigned to a corresponding GPU. The method includes generating a plurality of dependency data at a first kernel as ea…
Who is the assignee on this patent?
Sony Interactive Entertainment LLC
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).