What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Implementing parameter server in networking infrastructure for high-performance computing

US11315013B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11315013-B2
Application number	US-201815960472-A
Country	US
Kind code	B2
Filing date	Apr 23, 2018
Priority date	Apr 23, 2018
Publication date	Apr 26, 2022
Grant date	Apr 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for implementing a parameter server within a networking infrastructure of a computing system to reduce the communication bandwidth and latency for performing communication synchronization operations of the parameter server. For example, a method includes executing a distributed deep learning (DL) model training process to train model parameters of a DL model using a plurality of worker nodes executing on one or more server nodes of a computing system, and executing a parameter server within a networking infrastructure of the computing system to aggregate local model parameters computed by the plurality of worker nodes and to distribute aggregated model parameters to the plurality of worker nodes using the networking infrastructure of the computing system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: executing a distributed deep learning (DL) model training process to train a DL model using a plurality of server nodes comprising at least a first server node and a second server node, wherein the first server node comprises a first processor, a first set of accelerator devices, and a first network interface component, wherein the second server node comprises a second processor, a second set of accelerator devices, and a second network interface component, wherein executing the DL model training process comprises performing an iterative process, wherein at least one iteration of the DL model training process comprises: distributing, by the first and second processors, a batch of training data to the respective first and second set of accelerator devices, wherein the accelerator devices of the first and second set of accelerator devices each receive a respective portion of the batch of training data; executing a first set of worker processes on the first set of accelerator devices, and a second set of worker processes on the second set of accelerator devices, wherein the worker processes of the first and second set of worker processes compute respective local parameters using the respective portions of the batch of training data; performing, by the worker processes of the first set of worker processes, respective direct memory copy operations to copy the respective local parameters to a first memory associated with the first network interface component; performing, by the worker processes of the second set of worker processes, respective direct memory copy operations to copy the respective local parameters to a second memory associated with the second network interface component; aggregating, by a first parameter server process executing on the first network interface component, the local parameters provided by the first set of worker processes to thereby generate a first set of local aggregated parameters, wherein the first parameter server process comprises a master parameter server process; aggregating, by a second parameter server process executing on the second network interface component, the local parameters provided by the second set of worker processes to thereby generate a second set of local aggregated parameters; performing, by the second parameter server process, a direct memory copy operation to copy the second set of local aggregated parameters to the first memory associated with the first network interface component; aggregating, by the first parameter server process, at least the first and second set of local aggregated parameters to thereby generate a global set of parameters; and performing, by the first parameter server process, a direct memory copy operation to copy the global set of parameters to the first memory associated with the first network interface component. 2. The method of claim 1 , wherein the first and second set of worker processes are managed by respective virtual worker nodes. 3. The method of claim 1 , wherein the first and second set of accelerator devices comprise graphics processing unit devices. 4. The method of claim 1 , wherein the first and second network interface components comprise respective first and second network interface cards of the respective first and second server nodes. 5. The method of claim 4 , wherein the first and second network interface cards comprise virtual network interface cards. 6. The method of claim 4 , wherein the first and second network interface cards comprise respective first and second physical network interface cards. 7. The method of claim 1 , wherein the direct memory copy operations, which are performed by the worker processes of the first and second set of worker processes to copy the respective local parameters to the respective first and second memories associated with the respective first and second network interface components, are implemented using a direct memory access (DMA) protocol. 8. The method of claim 1 , wherein the direct memory copy operations, which are performed by the first and second parameter server processes, are implemented using a remote direct memory access (RDMA) protocol. 9. An article of manufacture comprising a processor-readable storage medium having stored program code of one or more software programs, wherein the program code is executable by one or more processors to implement method steps comprising: executing a distributed deep learning (DL) model training process to train a DL model using a plurality of server nodes comprising at least a first server node and a second server node, wherein the first server node comprises a first processor, a first set of accelerator devices, and a first network interface component, wherein the second server node comprises a second processor, a second set of accelerator devices, and a second network interface component, wherein executing the DL model training process comprises performing an iterative process, wherein at least one iteration of the DL model training process comprises: distributing, by the first and second processors, a batch of training data to the respective first and second set of accelerator devices, wherein the accelerator devices of the first and second set of accelerator devices each receive a respective portion of the batch of training data; executing a first set of worker processes on the first set of accelerator devices, and a second set of worker processes on the second set of accelerator devices, wherein the worker processes of the first and second set of worker processes compute respective local parameters using the respective portions of the batch of training data; performing, by the worker processes of the first set of worker processes, respective direct memory copy operations to copy the respective local parameters to a first memory associated with the first network interface component; performing, by the worker processes of the second set of worker processes, respective direct memory copy operations to copy the respective local parameters to a second memory associated with the second network interface component; aggregating, by a first parameter server process executing on the first network interface component, the local parameters provided by the first set of worker processes to thereby generate a first set of local aggregated parameters, wherein the first parameter server process comprises a master parameter server process; aggregating, by a second parameter server process executing on the second network interface component, the local parameters provided by the second set of worker processes to thereby generate a second set of local aggregated parameters; performing, by the second parameter server process, a direct memory copy operation to copy the second set of local aggregated parameters to the first memory associated with the first network interface component; aggregating, by the first parameter server process, at least the first and second set of local aggregated parameters to thereby generate a global set of parameters; and performing, by the first parameter server process, a direct memory copy operation to copy the global set of parameters to the first memory associated with the first network interface component. 10. The article of manufacture of claim 9 , wherein the first and second set of worker processes are managed by respective virtual worker nodes. 11. The article of manufacture of claim 9 , wherein the first and second set of accelerator devices comprise graphics processing unit devices. 12. The article of manufacture of claim 9 , wherein the first and second network interface components comprise respective first and second ne

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 68238025

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11315013B2 cover?: Techniques are provided for implementing a parameter server within a networking infrastructure of a computing system to reduce the communication bandwidth and latency for performing communication synchronization operations of the parameter server. For example, a method includes executing a distributed deep learning (DL) model training process to train model parameters of a DL model using a plur…
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Technologies for managing disaggregated resources in a data center

Communication link testing

System and method for analytics-driven SLA management and insight generation in clouds

Network interface device and method

System decoder for training accelerators

Lightweight transport protocol

Resource management for peripheral component interconnect-express domains

Frequently asked questions