Hardware implemented point to point communication primitives for machine learning
US-2023177328-A1 · Jun 8, 2023 · US
US12008468B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12008468-B2 |
| Application number | US-201916967702-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 6, 2019 |
| Priority date | Feb 16, 2018 |
| Publication date | Jun 11, 2024 |
| Grant date | Jun 11, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Each of learning nodes calculates gradients of a loss function from an output result obtained by inputting learning data to a learning target neural network, converts a calculation result into a packet, and transmits the packet to a computing interconnect device. The computing interconnect device receives the packet transmitted from each of the learning nodes, acquires a value of the gradients stored in the packet, calculates a sum of the gradients, converts a calculation result into a packet, and transmits the packet to each of the learning nodes. Each of the learning nodes receives the packet transmitted from the computing interconnect device and updates a constituent parameter of a neural network based on a value stored in the packet.
Opening claim text (preview).
The invention claimed is: 1. A distributed deep learning system comprising: a plurality of learning nodes; and a plurality of computing interconnect device connected to the plurality of learning nodes via a communication network; wherein each learning node of the plurality of learning nodes comprises: one or more first processors; and a first non-transitory computer-readable storage medium storing a first program to be executed by the one or more first processors, the first program including instructions to: calculate a gradient of a loss function from an output result obtained by inputting learning data to a learning target neural network corresponding to the learning node; convert the gradient of the loss function into a first packet; transmit the first packet to a computing interconnect device of the plurality of computing interconnect devices; acquire a value stored in a second packet received from the computing interconnect device; and update a constituent parameter of the learning target neural network based on the value stored in the second packet; and wherein a first computing interconnect device of the plurality of computing interconnect devices that is positioned at highest order among the plurality of computing interconnect devices comprises: one or more second processors; and a second non-transitory computer-readable storage medium storing a second program to be executed by the one or more second processors, the second program including instructions to: receive a third packet from a second computing interconnect device of the plurality of computing interconnect devices, the second computing interconnect device is at an immediately lower order than the first computing interconnect device; receive a fourth packet transmitted from a first learning node of the plurality of learning nodes that is connected to the first interconnect computing device; acquire a value of a gradient stored in the third packet and a value of a gradient stored in the fourth packet; perform calculation processing on the value of the gradient in the third packet and the value of the gradient in the fourth packet; convert a calculation result of the calculation processing into a fifth packet; and transmit the fifth packet to the second computing interconnect device at the immediately lower order than the first computing interconnect device and to the first learning node connected to the first computing interconnect device. 2. The distributed deep learning system of claim 1 , wherein the first computing interconnect device further comprises a constituent parameter memory that stores a respective constituent parameter of a respective learning target neural network of each of the plurality of learning nodes. 3. The distributed deep learning system of claim 2 , wherein the second program comprises further instructions to: calculate, based on the calculation result of the calculation processing, an updated value of a first constituent parameter of a first learning target neural network stored in the constituent parameter memory, wherein the first learning target neural network corresponds to one of the plurality of learning nodes; and update the first constituent parameter stored in the constituent parameter memory with the updated value. 4. A distributed deep learning system comprising: a plurality of learning nodes; and a plurality of computing interconnect devices connected to the plurality of learning nodes or other devices via a communication network; wherein each learning node of the plurality of learning nodes comprises: a gradient calculator that calculates a gradient of a loss function from a respective output result obtained by inputting learning data to a learning target neural network corresponding to the learning node; a first transmitter that converts a calculation result of the gradient calculator into a first packet and transmits the first packet to a computing interconnect device of the plurality of computing interconnect devices connected to the learning node; a first receiver that receives a second packet transmitted from the computing interconnect device connected to the learning node and acquires a value stored in the second packet; and a constituent-parameter updater that updates a constituent parameter of the learning target neural network based on the value stored in the second packet; wherein a first computing interconnect device of the plurality of computing interconnect devices that is positioned at highest order among the plurality of computing interconnect devices comprises: a second receiver that: receives a third packet transmitted from a second computing interconnect device of the plurality of computing interconnect devices, the second computing interconnect device is at an immediately lower order than the first computing interconnect device; and receives a fourth packet transmitted from a first learning node of the plurality of learning nodes that is connected to the first computing interconnect device and acquires a value of a gradient stored in the third packet and a value of a gradient stored in the fourth packet; a first calculator that receives and performs calculation processing on the value of the gradient stored in the third packet and the value of the gradient stored in the fourth packet; and a second transmitter that converts a calculation result of the first calculator into a fifth packet and transmits the fifth packet to the second computing interconnect device at the immediately lower order than the first computing interconnect device and to the first learning node connected to the first computing interconnect device, and wherein a another computing interconnect device of the plurality of computing interconnect devices at a lower order than the first computing interconnect device comprises: a third receiver that: receives a sixth packet transmitted from a third computing interconnect device at an immediately lower order than the another computing interconnect device or transmitted from a second learning node of the plurality of learning nodes connected to the another computing interconnect device; and acquires a value of a gradient stored in the sixth packet; a second calculator that receives and performs calculation processing on the value of the gradient in the sixth packet and performs calculation processing; and a third transmitter that converts a calculation result of the second calculator into a seventh packet and transmits the seventh packet to a fourth computing interconnect device at an immediately higher order than the another computing interconnect device. 5. The distributed deep learning system of claim 4 , wherein the third transmitter further forwards an eighth packet transmitted from the fourth computing interconnect device to the third computing interconnect device or the second learning node. 6. The distributed deep learning system of claim 4 , wherein: the third receiver receives an eighth packet transmitted from the fourth computing interconnect device and acquires a value stored in the eighth packet; and the third transmitter converts the value stored in the eight packet into a ninth packet and transmits the ninth packet to the fourth computing interconnect device or the second learning node. 7. The distributed deep learning system of claim 4 , wherein the first computing interconnect device further comprises a constituent parameter memory that stores a respective constituent parameter of a respective learning target neural network of each of the plurality of learning nodes. 8. The distributed deep learning system of claim 7 , wherein the first computing interconnect device further comprises: a constituent-parameter-updater that calculates, based on th
Backpropagation, e.g. using gradient descent · CPC title
Distributed learning, e.g. federated learning · CPC title
Feedforward networks · CPC title
Supervised learning · CPC title
using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.