Local multicast in single-host multi-GPU machine for distributed deep learning systems

US10614356B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10614356-B2
Application numberUS-201715495550-A
CountryUS
Kind codeB2
Filing dateApr 24, 2017
Priority dateApr 24, 2017
Publication dateApr 7, 2020
Grant dateApr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A network interface controller of a machine receives a packet including at least one model parameter of a neural network model from a server. The packet includes a virtual address associated with the network interface controller, and the machine further includes a plurality of graphics processing units coupled to the network interface controller by a bus. The network interface controller translates the virtual address to a memory address associated with each of the plurality of graphics processing units. The network interface controller broadcasts the at least one model parameter to the memory address associated with each of the plurality of graphics processing units.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: configuring a plurality of graphics processing units of a machine to operate as a deep learning neural network, the plurality of graphics processing units coupled to a network interface controller of the machine; configuring a graphics processing unit in the plurality of graphics processing units to compute a set of parameters of a neural network model wherein the set of parameters cause the neural network model to be trained for a given set of input training data; and configuring each graphics processing unit in the plurality of graphics processing units to transmit to a neural network parameter server (server) a corresponding set of parameters of the neural network model, but instead of each graphics processing unit receiving an updated set of parameters of the neural network model individually from the server, receiving the updated set of parameters of the neural network model as a local broadcast within the machine from the network interface controller of the machine, wherein receiving the updated set of parameters locally within the machine comprises: receiving, by the network interface controller of the machine, a packet including at least one model parameter of the neural network model, from a server, the packet including a virtual address associated with the network interface controller; translating, by the network interface controller, the virtual address to a plurality of memory addresses, each memory address associated with a corresponding one of the plurality of graphics processing units; and broadcasting, using a local multicast by the network interface controller, the at least one model parameter to the plurality of memory addresses associated with the plurality of graphics processing units. 2. The method of claim 1 , further comprising storing a mapping of the virtual address to each of the plurality of memory addresses associated within a table. 3. The method of claim 2 , wherein the table is stored within the network interface controller. 4. The method of claim 1 , further comprising registering each of the plurality of graphics processing units with the virtual address. 5. The method of claim 1 , wherein the at least one model parameter includes weights of the neural network model. 6. The method of claim 5 , wherein each of the plurality of graphics processing units is configured to compute a gradient based upon the weights. 7. The method of claim 6 , wherein each of the plurality of graphics processing units is configured to send the computed gradients to the server. 8. The method of claim 1 , wherein the network interface controller is a remote direct memory access enabled network interface controller. 9. A computer usable program product comprising one or more computer-readable storage mediums, and program instructions stored on at least one of the one or more storage devices, the stored program instructions comprising: program instructions to configure a plurality of graphics processing units of a machine to operate as a deep learning neural network, the plurality of graphics processing units coupled to a network interface controller of the machine; program instructions to configure a graphics processing unit in the plurality of graphics processing units to compute a set of parameters of a neural network model wherein the set of parameters cause the neural network model to be trained for a given set of input training data; and program instructions to configure each graphics processing unit in the plurality of graphics processing units to transmit to a neural network parameter server (server) a corresponding set of parameters of the neural network model, but instead of each graphics processing unit receiving an updated set of parameters of the neural network model individually from the server, receiving the updated set of parameters of the neural network model as a local broadcast within the machine from the network interface controller of the machine, wherein receiving the updated set of parameters locally within the machine comprises: program instructions to receive, by the network interface controller of the machine, a packet including at least one model parameter of the neural network model, from a server, the packet including a virtual address associated with the network interface controller; program instructions to translate, by the network interface controller, the virtual address to a plurality of memory addresses, each memory address associated with a corresponding one of the plurality of graphics processing units; and program instructions to broadcast, using a local multicast by the network interface controller, the at least one model parameter to the plurality of memory addresses associated with the plurality of graphics processing units. 10. The computer usable program product of claim 9 , further comprising: program instructions to store a mapping of the virtual address to each of the plurality of memory addresses associated within a table. 11. The computer usable program product of claim 10 , wherein the table is stored within the network interface controller. 12. The computer usable program product of claim 9 , further comprising: program instructions to register each of the plurality of graphics processing units with the virtual address. 13. The computer usable program product of claim 9 , wherein the at least one model parameter includes weights of the neural network model. 14. The computer usable program product of claim 13 , wherein each of the plurality of graphics processing units is configured to compute a gradient based upon the weights. 15. The computer usable program product of claim 14 , wherein each of the plurality of graphics processing units is configured to send the computed gradients to the server. 16. The computer usable program product of claim 9 , wherein the network interface controller is a remote direct memory access enabled network interface controller. 17. The computer usable program product of claim 9 , wherein the computer usable code is stored in a computer readable storage device in a data processing system, and wherein the computer usable code is transferred over a network from a remote data processing system. 18. The computer usable program product of claim 9 , wherein the computer usable code is stored in a computer readable storage device in a server data processing system, and wherein the computer usable code is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system. 19. A computer system comprising one or more processors, one or more computer-readable memories, and one or more computer-readable storage mediums, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising: program instructions to configure a plurality of graphics processing units of a machine to operate as a deep learning neural network, the plurality of graphics processing units coupled to a network interface controller of the machine; program instructions to configure a graphics processing unit in the plurality of graphics processing units to compute a set of parameters of a neural network model wherein the set of parameters cause the neural network model to be trained for a given set of input training data; and program instructions to configure each graphics processing unit in t

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • using page tables, e.g. page table structures · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Details of virtual memory and virtual address translation · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10614356B2 cover?
A network interface controller of a machine receives a packet including at least one model parameter of a neural network model from a server. The packet includes a virtual address associated with the network interface controller, and the machine further includes a plurality of graphics processing units coupled to the network interface controller by a bus. The network interface controller transl…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).