Who is the assignee on this patent?

Sony Interactive Entertainment LLC

What technology area does this patent fall under?

Primary CPC classification G06F9/52. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for efficient multi-GPU execution of kernels by region based dependencies

US11288765B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11288765-B2
Application number	US-202016861049-A
Country	US
Kind code	B2
Filing date	Apr 28, 2020
Priority date	Apr 28, 2020
Publication date	Mar 29, 2022
Grant date	Mar 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods for graphics processing are provided. One example method includes executing a plurality of kernels using a plurality of graphics processing units (GPUs), wherein responsibility for executing a corresponding kernel is divided into one or more portions each of which being assigned to a corresponding GPU. The method includes generating a plurality of dependency data at a first kernel as each of a first plurality of portions of the first kernel completes processing. The method includes checking dependency data from one or more portions of the first kernel prior to execution of a portion of a second kernel. The method includes delaying execution of the portion of the second kernel as long as the corresponding dependency data of the first kernel has not been met.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for graphics processing, comprising: executing a plurality of kernels using a plurality of graphics processing units (GPUs), wherein responsibility for executing a corresponding kernel of the plurality of kernels is divided between one or more portions of the corresponding kernel each of which being assigned to a corresponding GPU of the plurality of GPUs; generating a plurality of dependency data at a first kernel as each of a first plurality of portions of the first kernel completes processing; checking first dependency data from one or more portions of the first kernel prior to execution of a portion of a second kernel; and delaying the execution of the portion of the second kernel as long as the first dependency data from the one or more portions of the first kernel has not been met, wherein the first dependency data from the one or more portions of the first kernel indicates whether the first kernel has finished executing the one or more portions of the first kernel, wherein the execution of the portion of the second kernel begins before the first plurality of portions of the first kernel has finished processing, wherein the one or more portions of the first kernel includes less portions than the first plurality of portions of the first kernel. 2. The method of claim 1 , wherein dependency data generated by a portion of the first kernel indicates completion of one or more writes to one or more regions of a resource. 3. The method of claim 2 , wherein a region corresponds to a subset of the resource, wherein the subset of the resource includes a tile of an image or a buffer range. 4. The method of claim 1 , wherein the first dependency data from the one or more portions of the first kernel indicates completion of writing to a region of a resource. 5. The method of claim 4 , wherein the first dependency data from the one or more portions of the first kernel is stored per portion, or wherein the first dependency data from the one or more portions of the first kernel is stored per region per portion. 6. The method of claim 1 , wherein each portion of the first plurality of portions of the first kernel corresponds to index ranges of an index space defined by one or more dimensions, wherein index ranges of the each portion of the first plurality of portions of the first kernel may entirely span the index space or may span a subset of the index space in each of the one or more dimensions utilized by the first kernel. 7. The method of claim 6 , wherein the first dependency data from the one or more portions of the first kernel is checked prior to the execution of the portion of the second kernel and is based on first index ranges for dimensions corresponding to the portion of the second kernel, the method including: checking second dependency data generated by a first portion of the first kernel defined by the first index ranges for the dimensions corresponding to the portion of the second kernel, or an offset thereof defining an offset index range, or checking third dependency data generated by multiple portions of the first kernel defined by second index ranges for dimensions that are, taken together, a superset of the first index ranges for the dimensions corresponding to the portion of the second kernel; or checking fourth dependency data generated by the one or more portions of the first kernel defined by third index ranges for dimensions derived from a function calculated using the first index ranges for the dimensions corresponding to the portion of the second kernel. 8. The method of claim 7 , wherein if the offset index range, the superset of the first index ranges for the dimensions corresponding to the portion of the second kernel, or the third index ranges for the dimensions derived from the function calculated using the first index ranges for the dimensions corresponding to the portion of the second kernel is outside of the index space, then: the first dependency data that is checked prior to the execution of the portion of the second kernel is ignored, or the first dependency data that is checked prior to the execution of the portion of the second kernel is checked for a second portion of the first kernel corresponding to an index range that is clamped so that the second portion of the first kernel corresponding to the index range that is clamped is inside of the index space; or the first dependency data that is checked prior to the execution of the portion of the second kernel is checked for a third portion of the first kernel corresponding to an index range that is wrapped in the index space. 9. The method of claim 1 , further comprising: executing a portion of the first kernel on a first GPU; and upon completion of execution of the portion of the first kernel by the first GPU, sending data generated by the portion of the first kernel to local memory of a second GPU. 10. The method of claim 1 , further comprising: executing a portion of the first kernel on a first GPU; and prior to the execution of the portion of the second kernel by a second GPU, fetching into local memory of the second GPU data generated by the portion of the first kernel. 11. The method of claim 1 , further comprising: fetching, via direct memory access (DMA), into local memory of a second GPU executing the portion of the second kernel, data generated by a portion of the first kernel executing on a first GPU and written to local memory of the first GPU. 12. The method of claim 11 , further comprising: accessing, at the second GPU prior to the completion of the DMA, the data generated by the portion of the first kernel executing on the first GPU directly from the local memory of the first GPU by normal read operations; or accessing, at the second GPU after the completion of the DMA, the data generated by the portion of the first kernel executing on the first GPU from the local memory of the second GPU. 13. The method of claim 1 , wherein the first dependency data from the one or more portions of the first kernel indicates completion of execution of a portion of the first kernel. 14. The method of claim 1 , wherein responsibility for executing each portion of the first plurality of portions of the first kernel is assigned to one and only one GPU, wherein the first plurality of portions of the first kernel is statically assigned to the plurality of GPUs. 15. The method of claim 1 , wherein responsibility for executing each portion of the first plurality of portions of the first kernel is assigned to one and only one GPU; and wherein the first plurality of portions of the first kernel is dynamically allocated to the plurality of GPUs as the first kernel is executed. 16. The method of claim 15 , wherein allocation of the first plurality of portions of the first kernel to the plurality of GPUs references one or more predefined orders each of which is different for each GPU. 17. The method of claim 16 , wherein a predefined order that is referenced is a space filling curve in dimensions of an index space of the first kernel. 18. The method of claim 15 , further comprising: prefetching, based on a predefined order of the second kernel at a second GPU, into local memory of the second GPU data generated by the first kernel executing on a first GPU. 19. The method of claim 1 , further comprising: wherein the plurality of GPUs share a common command buffer that may contain one or more kernel invocations, or one or more draw calls, or a combination of the one or more kernel in

Assignees

Sony Interactive Entertainment LLC

Inventors

Classifications

G06F9/505
considering the load · CPC title
G06F9/52Primary
Program synchronisation; Mutual exclusion, e.g. by means of semaphores · CPC title
G06F9/4881Primary
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
G06T15/005
General purpose rendering architectures · CPC title
G06T1/60
Memory management · CPC title

Patent family

Related publications grouped by family.

View patent family 76250424

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11288765B2 cover?: Methods for graphics processing are provided. One example method includes executing a plurality of kernels using a plurality of graphics processing units (GPUs), wherein responsibility for executing a corresponding kernel is divided into one or more portions each of which being assigned to a corresponding GPU. The method includes generating a plurality of dependency data at a first kernel as ea…
Who is the assignee on this patent?: Sony Interactive Entertainment LLC
What technology area does this patent fall under?: Primary CPC classification G06F9/52. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Local memory sharing between kernels

Instruction prefetch based on thread dispatch commands

Dependency handling for set-aside of compute control stream commands

Conditional shader for graphics

System And Method For Unified Application Programming Interface And Model

Gpu divergence barrier

Frequently asked questions