Automatic coalescing of GPU-initiated network communication

US11677839B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11677839-B2
Application numberUS-202117351002-A
CountryUS
Kind codeB2
Filing dateJun 17, 2021
Priority dateJun 17, 2021
Publication dateJun 13, 2023
Grant dateJun 13, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques are directed to automatic coalescing of GPU-initiated network communications. In one method, a communication engine receives, from a shared memory application executing on a first graphics processing unit (GPU), a first communication request assigned to or having a second GPU as a destination to be processed. The communication engine determines that the first communication request satisfies a coalescing criterion and stores the first communication request in association with a group of requests that have a common property. The communication engine coalesces the group of requests into a coalesced request and transports the coalesced request to the second GPU over a network.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, from a shared memory application executing on a first graphics processing unit (GPU), a first communication request having a second GPU as a destination; determining whether the first communication request satisfies a coalescing criterion for network transport over a network to the second GPU; transporting, in response to determining the first communication request does not satisfy the coalescing criterion, the first communication request to the second GPU over a peer-to-peer (P2P) connection between the first and second GPUs; in response to determining the first communication request satisfies the coalescing criterion, storing the first communication request in association with a group of requests that have a common property; determining that a timer associated with the group of requests expires or a size of the group satisfies a group size criterion; coalescing the group of requests into a coalesced request; and transporting the coalesced request to the second GPU over the network. 2. The method of claim 1 , further comprising: receiving, from the shared memory application, a second communication request, wherein the first communication request originates from a first group of threads of the first GPU, and wherein the second communication request originates from a second group of threads of the first GPU; determining that the second communication request satisfies the coalescing criterion; and storing the second communication request in association with the group of requests that have the common property. 3. The method of claim 1 , further comprising: receiving, from the shared memory application, a second communication request having a third GPU as a destination; determining that the second communication request does not satisfy the coalescing criterion, wherein the second communication request is transportable via a P2P connection with the third GPU; and transporting the second communication request to the third GPU over the P2P connection. 4. The method of claim 1 , wherein determining that the first communication request satisfies the coalescing criterion comprises determining that the first communication request satisfies at least one of a request size criterion, a latency criterion, or a P2P connectivity criterion. 5. The method of claim 1 , wherein the common property is at least one of a same operation type, a same network destination, a same GPU destination, or adjacent memory locations. 6. The method of claim 1 , further comprising: receiving, from the shared memory application, a second communication request, wherein the second communication request originates from a group of threads of the first GPU and has a third GPU as a destination; performing group-level coalescing of the second communication request with other communication requests from the group of threads to obtain a group-level request; determining that the group-level request satisfies the coalescing criterion; storing the group-level request in association with a second group of requests that have a common property; determining that a second timer associated with the second group of requests expires or a size of the second group satisfies the group size criterion; coalescing the second group of requests into a second coalesced request; and transporting the second coalesced request to the third GPU. 7. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a software communication engine implemented using the first GPU. 8. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a software communication engine implemented using a first kernel in the first GPU, wherein the shared memory application is executed using a second kernel in the first GPU. 9. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a communication engine implemented as hardware logic using a hardware offload circuit coupled to the first GPU. 10. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a communication engine implemented as a software communication engine using a central processing unit (CPU) operatively coupled to the first GPU. 11. The method of claim 1 , wherein at least one of the receiving, the determining the first communication request satisfied a coalescing criterion, the storing, the determining a timer associated with the group of requests expires or a size of the group satisfies a group size criterion, the coalescing, or the transporting is executed by a communication engine implemented using a software communication engine in a third GPU coupled to the first GPU. 12. A system comprising: a memory device; a central processing unit (CPU); and a first graphics processing unit (GPU) operatively coupled to the memory device and the CPU, the first GPU to execute a communication engine, wherein the communication engine is to: receive, from a shared memory application, a first communication request having a second GPU as a destination; determine whether the first communication request satisfies a coalescing criterion for network transport over a network to the second GPU; transport, in response to determining the first communication request does not satisfy the coalescing criterion, the first communication request to the second GPU over a peer-to-peer (P2P) connection between the first and second GPUs; in response to determining the first communication request satisfies the coalescing criterion, store the first communication request in association with a group of requests that have a common property; determine that a timer associated with the group of requests expires or a size of the group satisfies a group size criterion; coalesce the group of requests into a coalesced request; and transport the coalesced request to the second GPU over the network. 13. The system of claim 12 , wherein the communication engine is further to: receive, from the shared memory application, a second communication request, wherein the first communication request originates from a first group of threads of the first GPU, and wherein the second communication request originates from a second group of threads of the first GPU; determine that the second communication request satisfies the coalescing criterion; and store the second communication request in association with the group of requests that have the common property. 14. The system of claim 12 , wherein the communication engine is further to: receive, from the shared memory application, a second communication request having a third GPU as a

Assignees

Inventors

Classifications

  • H04L67/104Primary

    Peer-to-peer [P2P] networks · CPC title

  • H04L67/141Primary

    Setup of application sessions (admission control or resource allocation in data switching networks H04L47/70) · CPC title

  • specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks · CPC title

  • H04L67/566Primary

    Grouping or aggregating service requests, e.g. for unified processing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11677839B2 cover?
Apparatuses, systems, and techniques are directed to automatic coalescing of GPU-initiated network communications. In one method, a communication engine receives, from a shared memory application executing on a first graphics processing unit (GPU), a first communication request assigned to or having a second GPU as a destination to be processed. The communication engine determines that the firs…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification H04L67/104. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jun 13 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).