Graphics processing systems
US-2020111247-A1 · Apr 9, 2020 · US
US11677839B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11677839-B2 |
| Application number | US-202117351002-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 17, 2021 |
| Priority date | Jun 17, 2021 |
| Publication date | Jun 13, 2023 |
| Grant date | Jun 13, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatuses, systems, and techniques are directed to automatic coalescing of GPU-initiated network communications. In one method, a communication engine receives, from a shared memory application executing on a first graphics processing unit (GPU), a first communication request assigned to or having a second GPU as a destination to be processed. The communication engine determines that the first communication request satisfies a coalescing criterion and stores the first communication request in association with a group of requests that have a common property. The communication engine coalesces the group of requests into a coalesced request and transports the coalesced request to the second GPU over a network.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, from a shared memory application executing on a first graphics processing unit (GPU), a first communication request having a second GPU as a destination; determining whether the first communication request satisfies a coalescing criterion for network transport over a network to the second GPU; transporting, in response to determining the first communication request does not satisfy the coalescing criterion, the first communication request to the second GPU over a peer-to-peer (P2P) connection between the first and second GPUs; in response to determining the first communication request satisfies the coalescing criterion, storing the first communication request in association with a group of requests that have a common property; determining that a timer associated with the group of requests expires or a size of the group satisfies a group size criterion; coalescing the group of requests into a coalesced request; and transporting the coalesced request to the second GPU over the network. 2. The method of claim 1 , further comprising: receiving, from the shared memory application, a second communication request, wherein the first communication request originates from a first group of threads of the first GPU, and wherein the second communication request originates from a second group of threads of the first GPU; determining that the second communication request satisfies the coalescing criterion; and storing the second communication request in association with the group of requests that have the common property. 3. The method of claim 1 , further comprising: receiving, from the shared memory application, a second communication request having a third GPU as a destination; determining that the second communication request does not satisfy the coalescing criterion, wherein the second communication request is transportable via a P2P connection with the third GPU; and transporting the second communication request to the third GPU over the P2P connection. 4. The method of claim 1 , wherein determining that the first communication request satisfies the coalescing criterion comprises determining that the first communication request satisfies at least one of a request size criterion, a latency criterion, or a P2P connectivity criterion. 5. The method of claim 1 , wherein the common property is at least one of a same operation type, a same network destination, a same GPU destination, or adjacent memory locations. 6. The method of claim 1 , further comprising: receiving, from the shared memory application, a second communication request, wherein the second communication request originates from a group of threads of the first GPU and has a third GPU as a destination; performing group-level coalescing of the second communication request with other communication requests from the group of threads to obtain a group-level request; determining that the group-level request satisfies the coalescing criterion; storing the group-level request in association with a second group of requests that have a common property; determining that a second timer associated with the second group of requests expires or a size of the second group satisfies the group size criterion; coalescing the second group of requests into a second coalesced request; and transporting the second coalesced request to the third GPU. 7. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a software communication engine implemented using the first GPU. 8. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a software communication engine implemented using a first kernel in the first GPU, wherein the shared memory application is executed using a second kernel in the first GPU. 9. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a communication engine implemented as hardware logic using a hardware offload circuit coupled to the first GPU. 10. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a communication engine implemented as a software communication engine using a central processing unit (CPU) operatively coupled to the first GPU. 11. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a communication engine implemented using a software communication engine in a third GPU coupled to the first GPU. 12. A system comprising: a memory device; a central processing unit (CPU); and a first graphics processing unit (GPU) operatively coupled to the memory device and the CPU, the first GPU to execute a communication engine, wherein the communication engine is to: receive, from a shared memory application, a first communication request having a second GPU as a destination; determine whether the first communication request satisfies a coalescing criterion for network transport over a network to the second GPU; transport, in response to determining the first communication request does not satisfy the coalescing criterion, the first communication request to the second GPU over a peer-to-peer (P2P) connection between the first and second GPUs; in response to determining the first communication request satisfies the coalescing criterion, store the first communication request in association with a group of requests that have a common property; determine that a timer associated with the group of requests expires or a size of the group satisfies a group size criterion; coalesce the group of requests into a coalesced request; and transport the coalesced request to the second GPU over the network. 13. The system of claim 12 , wherein the communication engine is further to: receive, from the shared memory application, a second communication request, wherein the first communication request originates from a first group of threads of the first GPU, and wherein the second communication request originates from a second group of threads of the first GPU; determine that the second communication request satisfies the coalescing criterion; and store the second communication request in association with the group of requests that have the common property. 14. The system of claim 12 , wherein the communication engine is further to: receive, from the shared memory application, a second communication request having a third GPU as a
Peer-to-peer [P2P] networks · CPC title
Setup of application sessions (admission control or resource allocation in data switching networks H04L47/70) · CPC title
specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks · CPC title
Grouping or aggregating service requests, e.g. for unified processing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.