What technology area does this patent fall under?

Primary CPC classification G06F15/17318. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 28 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Embedding rings on a toroid computer network

US11372791B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11372791-B2
Application number	US-202016831630-A
Country	US
Kind code	B2
Filing date	Mar 26, 2020
Priority date	Mar 27, 2019
Publication date	Jun 28, 2022
Grant date	Jun 28, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers is provided. Each layer comprises a plurality of processing nodes connected in a ring by an intralayer respective set of links between each pair of neighbouring processing nodes, the links adapted to operate simultaneously. Nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link. Each processing node in the first endmost layer is connected to a corresponding node in the second endmost layer. Data is transmitted around a plurality of embedded one-dimensional logical rings with an asymmetric bandwidth utilisation, each logical ring using all processing nodes of the computer in such a manner that the plurality of embedded one-dimensional logical rings operate simultaneously.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers; each layer comprising a plurality of processing nodes connected in a ring by an intralayer respective set of links between each pair of neighbouring processing nodes, the links in each set adapted to operate simultaneously; wherein processing nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link, wherein each processing node in the first endmost layer is connected to a corresponding node in the second endmost layer; and the computer being programmed to transmit data around a plurality of embedded one-dimensional logical rings, each logical ring using all processing nodes of the computer in such a manner that the plurality of embedded one-dimensional logical rings operate simultaneously; wherein the computer is programmed to transmit the data with an asymmetric bandwidth utilisation. 2. The computer according to claim 1 , wherein the utilisation of intralayer link bandwidth is greater than the utilisation of bandwidth along the axis. 3. The computer according to claim 1 , wherein the embedded rings are isomorphic. 4. The computer according to claim 1 , wherein the set of intralayer links comprises two links, and the bandwidth utilisation is B/6 along the axis, and B/3 within each layer, where B is the total bandwidth of each processing node. 5. The computer according to claim 4 , wherein three rings are embedded. 6. The computer according to claim 4 , wherein the set of intralayer links comprises three links, and the bandwidth utilisation is 3B/8 within each layer and B/8 along the axis. 7. The computer according to claim 6 , wherein four rings are embedded. 8. The computer according to claim 1 , which is configured such that data passes along each ring through the processing nodes in each layer in one of an anticlockwise and clockwise direction. 9. The computer according to claim 8 , which is configured such that the data passes through successive layers in the same direction. 10. The computer according to claim 8 , which is configured such that the data passes through successive layers in opposite directions. 11. The computer according to claim 1 , wherein each processing node comprises memory configured to store an array of data items ready to be exchanged in a reduce scatter phase, wherein each data item is respectively positioned in the array with corresponding data items being respectively positioned at corresponding locations in the arrays of other processing nodes. 12. The computer according to claim 1 , wherein each processing node is programmed to transmit data items in a forwards direction to its adjacent processing node in each ring in a reduce-scatter phase. 13. The computer according to claim 1 , wherein each processing node is programmed to generate a vector of partial deltas in a compute step and to divide its vector into sub arrays for respective utilisation of the embedded rings. 14. The computer according to claim 1 , wherein each processing node is programmed to divide a respective partial vector of that node into fragments and to transmit the data in the form of successive fragments around each one-dimensional path. 15. The computer according to claim 14 , which is configured to operate such that the successive fragments are transmitted around each logical ring in simultaneous transmission steps. 16. The computer according to claim 1 , wherein each processing node is configured to reduce incoming fragments with respective corresponding locally stored fragments. 17. A method of generating a set of programs to be executed in parallel on a computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers; each layer comprising a plurality of processing nodes connected in a ring by an intralayer respective set of links between each pair of neighbouring processing nodes, the links in each set adapted to operate simultaneously; and wherein processing nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link, wherein each processing node in the first endmost layer is connected to a corresponding node in the second endmost layer; the method comprising: generating at least one data transmission instruction for each program to define a data transmission stage in which data is transmitted from the processing node executing that program, wherein the data transmission instruction comprises a link identifier which defines an outgoing link on which data is to be transmitted in that data transmission stage; and determining the link identifiers in order to transmit data around a plurality of embedded one-dimensional logical rings, each logical ring using all processing nodes of the computer in such a manner that a plurality of embedded one-dimensional logical rings operate simultaneously; wherein the programs are generated to transmit the data with an asymmetric bandwidth utilisation. 18. The method according to claim 17 , wherein the utilisation of intralayer link bandwidth is greater than the utilisation of bandwidth along the axis. 19. A method of executing a set of programs in parallel on a computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers; each layer comprising a plurality of processing nodes connected in a ring by an intralayer respective set of links between each pair of neighbouring processing nodes, the links in each set adapted to operate simultaneously; wherein processing nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link, wherein each processing node in the first endmost layer is connected to a corresponding node in the second endmost layer; the method comprising: executing at least one data transmission instruction in each program to define a data transmission stage in which data is transmitted from the processing node executing that program, wherein the data transmission instruction comprises a link identifier which defines an outgoing link on which data is to be transmitted in that data transmission stage; and the link identifiers having been determined in order to transmit data around each of a plurality of embedded one dimensional logical rings formed by respective sets of processing nodes and links in such a manner that a plurality of embedded one-dimensional logical rings operate simultaneously, each logical ring using all processing nodes of the computer; wherein the data is transmitted with an asymmetric bandwidth utilisation.

Assignees

Graphcore Ltd

Inventors

Knowles Simon

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F15/17318Primary
Parallel communications techniques, e.g. gather, scatter, reduce, roadcast, multicast, all to all · CPC title
G06F15/8015Primary
One dimensional arrays, e.g. rings, linear arrays, buses · CPC title

Patent family

Related publications grouped by family.

View patent family 66381537

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11372791B2 cover?: A computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers is provided. Each layer comprises a plurality of processing nodes connected in a ring by an intralayer respective set of links betw…
Who is the assignee on this patent?: Graphcore Ltd
What technology area does this patent fall under?: Primary CPC classification G06F15/17318. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 28 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Collective communication operation

Parallel processing of reduction and broadcast operations on large datasets of non-scalar data

A rack assembly structure

Frequently asked questions