What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Triggered operations to improve allreduce overlap

US11645534B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11645534-B2
Application number	US-201816127416-A
Country	US
Kind code	B2
Filing date	Sep 11, 2018
Priority date	Sep 11, 2018
Publication date	May 9, 2023
Grant date	May 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment of a semiconductor package apparatus may include technology to embed one or more trigger operations in one or more messages related to collective operations for a neural network, and issue the one or more messages related to the collective operations to a hardware-based message scheduler in a desired order of execution. Other embodiments are disclosed and claimed.

First claim

Opening claim text (preview).

We claim: 1. A machine learning system, comprising: memory; and logic communicatively coupled to the memory and a neural network to: during a backward propagation phase associated with an iterative process to train the neural network, overlap execution of a first layer of the neural network with transmission of a first message, wherein the first message is associated with a first collective operation associated with a second layer of the neural network; and during a forward propagation phase associated with the neural network, determine that a second message is to be transmitted based on a previous identification from a previous iteration of the iterative process of a number of messages that are transmittable by a third layer of the neural network during computation of a fourth layer of the neural network, wherein the second message is related to a second collective operation for the neural network, wherein the second collective operation is an operation of the third layer of the neural network that was not completed during the backward propagation phase. 2. The system of claim 1 , wherein the logic is further to: construct a directed acyclic graph corresponding to collective operations for the neural network including the first and second collective operations; and offload execution of the directed acyclic graph to a hardware-based message scheduler. 3. The system of claim 1 , wherein the logic is further to: organize a set of collective operations for gradient exchange based on all layers of the neural network. 4. The system of claim 3 , wherein the logic is further to: overlap messages for a current layer of the neural network with messages of one or more prior layers of the neural network in the backward propagation phase, wherein the first collective operation is an Allreduce operation, and wherein the second collective operation is an Allreduce operation. 5. The system of claim 1 , wherein the neural network comprises a deep learning neural network. 6. A semiconductor package apparatus, comprising: one or more substrates; and logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic or fixed-functionality logic hardware, the logic coupled to the one or more substrates to: during a backward propagation phase associated with an iterative process to train a neural network, overlap execution of a first layer of the neural network with transmission of a first message, wherein the first message is associated with a first collective operation associated with a second layer of the neural network; and during a forward propagation phase associated with the neural network, determine that a second message is to be transmitted based on a previous identification from a previous iteration of the iterative process of a number of messages that are transmittable by a third layer of the neural network during computation of a fourth layer of the neural network, wherein the second message is related to a second collective operation for the neural network, wherein the second collective operation is an operation of the third layer of the neural network that was not completed during the backward propagation phase. 7. The apparatus of claim 6 , wherein the logic is further to: construct a directed acyclic graph corresponding to collective operations for the neural network including the first and second collective operations; and offload execution of the directed acyclic graph to a hardware-based message scheduler. 8. The apparatus of claim 6 , wherein the logic is further to: organize a set of collective operations for gradient exchange based on all layers of the neural network. 9. The apparatus of claim 8 , wherein the logic is further to: overlap messages for a current layer of the neural network with messages of one or more prior layers of the neural network in the backward propagation phase, wherein the first collective operation is an Allreduce operation, and wherein the second collective operation is an Allreduce operation. 10. The apparatus of claim 6 , wherein the neural network comprises a deep learning neural network. 11. The apparatus of claim 6 , wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates. 12. A method of machine learning, comprising: during a backward propagation phase associated with a training process for training a neural network, overlapping execution of a first layer of the neural network with transmission of a first message, wherein the first message is associated with a first collective operation associated with a second layer of the neural network; and during a forward propagation phase associated with the neural network, determine that a second message will be transmitted based on a previous identification from a previous iteration of the iterative process of a number of messages that are transmittable by a third layer of the neural network during computation of a fourth layer of the neural network, wherein the second message is related to a second collective operation for the neural network, wherein the second collective operation is an operation of the third layer of the neural network that was not completed during the backward propagation phase. 13. The method of claim 12 , further comprising: constructing a directed acyclic graph corresponding to collective operations for the neural network including the first and second collective operations; and offloading execution of the directed acyclic graph to a hardware-based message scheduler. 14. The method of claim 12 , further comprising: organizing a set of collective operations for gradient exchange based on all layers of the neural network. 15. The method of claim 14 , further comprising: overlapping messages for a current layer of the neural network with messages of one or more prior layers of the neural network in the backward propagation phase, wherein the first collective operation is an Allreduce operation, and wherein the second collective operation is an Allreduce operation. 16. The method of claim 12 , wherein the neural network comprises a deep learning neural network. 17. At least one non-transitory computer readable storage medium, comprising a set of instructions, which when executed by a computing device, cause the computing device to: during a backward propagation phase associated with an iterative process to train a neural network, overlap execution of a first layer of the neural network with transmission of a first message, wherein the first message is associated with a first collective operation associated with a second layer of the neural network; and during a forward propagation phase associated with the neural network, determine that a second message is to be transmitted based on a previous identification from a previous iteration of the iterative process of a number of messages that are transmittable by a third layer of the neural network during computation of a fourth layer of the neural network, wherein the second message is related to a second collective operation for the neural network, wherein the second collective operation is an operation of the third layer of the neural network that was not completed during the backward propagation phase. 18. The at least one non-transitory computer readable storage medium of claim 17 , comprising a further set of instructions, which when executed by the computing device, cause the computing device to: construct a directed

Assignees

Intel Corp

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06N3/063
using electronic means · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

View patent family 65230212

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11645534B2 cover?: An embodiment of a semiconductor package apparatus may include technology to embed one or more trigger operations in one or more messages related to collective operations for a neural network, and issue the one or more messages related to the collective operations to a hardware-based message scheduler in a desired order of execution. Other embodiments are disclosed and claimed.
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Distributed Deep Learning System and Data Transfer Method

Method and system for opportunistic load balancing in neural networks using metadata

Topology-aware provisioning of hardware accelerator resources in a distributed environment

Parallel information processing apparatus, information processing method and non-transitory recording medium

Frequently asked questions