Networked computer

US11614946B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11614946-B2
Application numberUS-202016831564-A
CountryUS
Kind codeB2
Filing dateMar 26, 2020
Priority dateMar 27, 2019
Publication dateMar 28, 2023
Grant dateMar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in rings such that each processing node is a member of a single clique and a single ring. The processing nodes of all cliques are configured to exchange in each exchange step of a machine learning collective via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer comprising a plurality of processing nodes, each processing nodes having at least one processor configured to process input data to generate output data in the form of an array of data items; the plurality of processing nodes arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links, the cliques being inter-connected in rings such that each processing node is a member of a single clique and a single ring, the processing nodes being configured to exchange data items in respective exchange steps of a machine learning collective, wherein the processing nodes of all cliques are configured to exchange in each exchange step via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node, wherein the machine learning collective is an Allreduce collective and each processing node is configured to exchange data items in exchange steps of an Allgather phase, following a reduce-scatter phase of the Allreduce collective, wherein in each step of the Allgather phase reduced data items are exchanged between processing nodes in a clique, and wherein the processing nodes are each configured to transmit data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce-scatter phase. 2. The computer according to claim 1 , wherein each processing node comprises memory configured to store an array of data items ready to be exchanged in the reduce-scatter phase, wherein each data item is respectively positioned in the array with corresponding data items being respectively positioned at corresponding locations in the arrays of other processing nodes. 3. The computer according to claim 1 , wherein the processing nodes are configured to transmit data items to their forwards adjacent processing node in the ring for all exchange steps of the reduce-scatter phase apart from a first step, in which no data items are transmitted between processing nodes connected in a ring. 4. The computer according to claim 1 , wherein the array at each processing node comprises two sub arrays and processing nodes are inter-connected by bi-directional links, wherein in each exchange step of the reduce-scatter phase, all processing nodes are configured to exchange with the other processing node(s) of their clique, two data items from one sub array and two further data items from the other sub array wherein the two data items and the further two data items are exchanged over the same bi-directional link in opposite directions. 5. The computer according to claim 4 , wherein the processing nodes are each configured to transmit data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce-scatter phase and wherein in at least some exchange steps of the reduce-scatter phase each processing node is configured to transmit data items to its adjacent backwards processing node in the ring, wherein the transmission in each of the forwards and backwards direction from each processing node is carried out on the same bi-directional link. 6. The computer according to claim 2 , wherein each array represents at least part of a vector of partial deltas, each partial delta representing an adjustment to a value stored at each processing node. 7. The computer according to claim 6 , wherein each processing node is configured to generate the vector of partial deltas in a compute step. 8. The computer according to claim 7 , wherein each processing node is configured to divide the vector into two sub arrays for separate exchange and reduction in the reduce-scatter phase. 9. The computer according to claim 7 , wherein each processing node is configured to generate the vector of partial deltas by carrying out a compute function on a set of values and a batch of incoming deltas, the partial deltas being the output of the compute function. 10. The computer according to claim 9 , which is configured to implement a machine learning model wherein the incoming batch data is training data, and the values are weights of the machine learning model. 11. A method of operating a computer comprising a plurality of processing nodes, each processing node having at least one processor configured to process input data to generate output data in the form of an array of data items, the plurality of processing nodes arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links, the cliques being interconnected in rings such that each processing node is a member of a single clique and a single ring, the method comprising exchanging data item in respect of exchange steps of a first phase of a machine learning collective, wherein in each exchange step the processing nodes of all cliques exchange via the respective first and second clique links at least two data items with the other processing nodes in its clique, and all processing nodes reduce each received data item with the data item in the corresponding position in the array on that processing node, wherein the machine learning collective is an Allreduce collective and each processing node exchanges data items in exchange steps of an Allgather phase, following a reduce-scatter phase of the Allreduce collective, wherein in each step of the Allgather phase reduced data items are exchanged between processing nodes in a clique, and wherein each processing node transmits data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce scatter phase. 12. The method according to claim 11 , wherein each processing node comprises memory configured to store an array of data items ready to be exchanged in the reduce-scatter phase, wherein each data item is respectively positioned in the array with corresponding data items being respectively positioned at corresponding locations in the arrays of other processing nodes. 13. The method according to claim 11 , wherein each processing node transmits data items to their forwards adjacent processing node in the ring for all exchange steps of the reduce-scatter phase apart from a first step, in which no data items are transmitted between processing nodes connected in a ring. 14. The method according to claim 11 , wherein the array at each processing node comprises two sub arrays and processing nodes are inter-connected by bi-directional links, wherein in each exchange step of the reduce-scatter phase, all processing nodes exchange with the other processing node(s) of their clique, two data items from one sub array and two further data items from the other sub array wherein the two data items and the further two data items are exchanged over the same bi-directional link in opposite directions. 15. The method according to claim 14 , wherein each processing node transmits data items in a forwards direction to its adjacent processing node in the ring in at least some of the exchange steps in the reduce-scatter phase and wherein in at least some exchange steps of the reduce-scatter phase each processing node transmits data items to its adjacent backwards processing node in the ring, wherein the transmission in each of the forwards and backwards direction from each processing node is carried out on the same bi-directional

Assignees

Inventors

Classifications

  • Distributed learning, e.g. federated learning · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Transfer mode dependent, e.g. ATM · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11614946B2 cover?
A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in …
Who is the assignee on this patent?
Graphcore Ltd
What technology area does this patent fall under?
Primary CPC classification G06F15/17318. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).