What technology area does this patent fall under?

Primary CPC classification G06F15/17318. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Networked computer

US11614946B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11614946-B2
Application number	US-202016831564-A
Country	US
Kind code	B2
Filing date	Mar 26, 2020
Priority date	Mar 27, 2019
Publication date	Mar 28, 2023
Grant date	Mar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in rings such that each processing node is a member of a single clique and a single ring. The processing nodes of all cliques are configured to exchange in each exchange step of a machine learning collective via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer comprising a plurality of processing nodes, each processing nodes having at least one processor configured to process input data to generate output data in the form of an array of data items; the plurality of processing nodes arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links, the cliques being inter-connected in rings such that each processing node is a member of a single clique and a single ring, the processing nodes being configured to exchange data items in respective exchange steps of a machine learning collective, wherein the processing nodes of all cliques are configured to exchange in each exchange step via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node, wherein the machine learning collective is an Allreduce collective and each processing node is configured to exchange data items in exchange steps of an Allgather phase, following a reduce-scatter phase of the Allreduce collective, wherein in each step of the Allgather phase reduced data items are exchanged between processing nodes in a clique, and wherein the processing nodes are each configured to transmit data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce-scatter phase. 2. The computer according to claim 1 , wherein each processing node comprises memory configured to store an array of data items ready to be exchanged in the reduce-scatter phase, wherein each data item is respectively positioned in the array with corresponding data items being respectively positioned at corresponding locations in the arrays of other processing nodes. 3. The computer according to claim 1 , wherein the processing nodes are configured to transmit data items to their forwards adjacent processing node in the ring for all exchange steps of the reduce-scatter phase apart from a first step, in which no data items are transmitted between processing nodes connected in a ring. 4. The computer according to claim 1 , wherein the array at each processing node comprises two sub arrays and processing nodes are inter-connected by bi-directional links, wherein in each exchange step of the reduce-scatter phase, all processing nodes are configured to exchange with the other processing node(s) of their clique, two data items from one sub array and two further data items from the other sub array wherein the two data items and the further two data items are exchanged over the same bi-directional link in opposite directions. 5. The computer according to claim 4 , wherein the processing nodes are each configured to transmit data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce-scatter phase and wherein in at least some exchange steps of the reduce-scatter phase each processing node is configured to transmit data items to its adjacent backwards processing node in the ring, wherein the transmission in each of the forwards and backwards direction from each processing node is carried out on the same bi-directional link. 6. The computer according to claim 2 , wherein each array represents at least part of a vector of partial deltas, each partial delta representing an adjustment to a value stored at each processing node. 7. The computer according to claim 6 , wherein each processing node is configured to generate the vector of partial deltas in a compute step. 8. The computer according to claim 7 , wherein each processing node is configured to divide the vector into two sub arrays for separate exchange and reduction in the reduce-scatter phase. 9. The computer according to claim 7 , wherein each processing node is configured to generate the vector of partial deltas by carrying out a compute function on a set of values and a batch of incoming deltas, the partial deltas being the output of the compute function. 10. The computer according to claim 9 , which is configured to implement a machine learning model wherein the incoming batch data is training data, and the values are weights of the machine learning model. 11. A method of operating a computer comprising a plurality of processing nodes, each processing node having at least one processor configured to process input data to generate output data in the form of an array of data items, the plurality of processing nodes arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links, the cliques being interconnected in rings such that each processing node is a member of a single clique and a single ring, the method comprising exchanging data item in respect of exchange steps of a first phase of a machine learning collective, wherein in each exchange step the processing nodes of all cliques exchange via the respective first and second clique links at least two data items with the other processing nodes in its clique, and all processing nodes reduce each received data item with the data item in the corresponding position in the array on that processing node, wherein the machine learning collective is an Allreduce collective and each processing node exchanges data items in exchange steps of an Allgather phase, following a reduce-scatter phase of the Allreduce collective, wherein in each step of the Allgather phase reduced data items are exchanged between processing nodes in a clique, and wherein each processing node transmits data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce scatter phase. 12. The method according to claim 11 , wherein each processing node comprises memory configured to store an array of data items ready to be exchanged in the reduce-scatter phase, wherein each data item is respectively positioned in the array with corresponding data items being respectively positioned at corresponding locations in the arrays of other processing nodes. 13. The method according to claim 11 , wherein each processing node transmits data items to their forwards adjacent processing node in the ring for all exchange steps of the reduce-scatter phase apart from a first step, in which no data items are transmitted between processing nodes connected in a ring. 14. The method according to claim 11 , wherein the array at each processing node comprises two sub arrays and processing nodes are inter-connected by bi-directional links, wherein in each exchange step of the reduce-scatter phase, all processing nodes exchange with the other processing node(s) of their clique, two data items from one sub array and two further data items from the other sub array wherein the two data items and the further two data items are exchanged over the same bi-directional link in opposite directions. 15. The method according to claim 14 , wherein each processing node transmits data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce-scatter phase and wherein in at least some exchange steps of the reduce-scatter phase each processing node transmits data items to its adjacent backwards processing node in the ring, wherein the transmission in each of the forwards and backwards direction from each processing node is carried out on the same bi-directional

Assignees

Graphcore Ltd

Inventors

Knowles Simon

Classifications

G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
H04L12/5601
Transfer mode dependent, e.g. ATM · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 66381517

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11614946B2 cover?: A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in …
Who is the assignee on this patent?: Graphcore Ltd
What technology area does this patent fall under?: Primary CPC classification G06F15/17318. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Collective communication operation

Hardware implemented point to point communication primitives for machine learning

Parallel processing of reduction and broadcast operations on large datasets of non-scalar data

Regional big data in process control systems

Collective engine method and apparatus

Position discovery by detecting irregularities in a network topology

Frequently asked questions