What technology area does this patent fall under?

Primary CPC classification G06F15/17318. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Networked computer with multiple embedded rings

US11720510B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11720510-B2
Application number	US-202016831580-A
Country	US
Kind code	B2
Filing date	Mar 26, 2020
Priority date	Mar 27, 2019
Publication date	Aug 8, 2023
Grant date	Aug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer comprising a plurality of interconnected processing nodes arranged in multiple stacked layers forming a multi-face prism is provided. Each face of the prism comprises multiple stacked pairs of nodes. Said nodes are connected by at least two intralayer links. Each node is connected to a corresponding node in an adjacent pair by an interlayer link. The corresponding nodes are connected by respective interlayer links to form respective edges. Each pair forms part of a layers, each layer comprising multiple nodes, each node connected to their neighbouring nodes in the layer by at least one of the intralayer links to form a ring. Data is transmitted around paths formed by respective sets of nodes and links, each path having a first portion between a first and second endmost layers, and a second portion provided between the second and first endmost layers and comprising one of the edges.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer comprising a plurality of interconnected processing nodes, wherein the interconnected processing nodes are arranged in a configuration of multiple stacked layers which comprise a first endmost layer and a second endmost layer and at least one intermediate layer forming a multi-face prism; wherein each face of the multi-face prism comprises multiple stacked pairs of processing nodes, wherein the processing nodes of each pair are connected by at least two intralayer links, and the processing node of each pair is connected to a corresponding processing node in an adjacent pair by at least one interlayer link wherein the corresponding processing nodes of the first endmost layer, the at least one intermediate layer and the second endmost layer are connected by respective interlayer links to form respective edges of the configuration; and wherein each pair of processing nodes forms part of one of the layers of the configuration, each layer comprising multiple processing nodes, each processing node connected to their neighbouring processing nodes in the layer by at least one of the intralayer links to form a ring; wherein the processing nodes are programmed to operate the configuration to transmit data around each of a plurality of one dimensional paths formed by respective sets of processing nodes and links, each one dimensional path having a first portion between the first endmost layer and the second endmost layer via the at least one intermediate layer using all processing nodes in one of the faces of the configuration only once, and a second portion provided between the second endmost layer and the first endmost layer and comprising one of the edges of the configuration. 2. The computer of claim 1 wherein the multi-face prism has three processing nodes in each layer, providing three respective faces for the first portion of respective one-dimensional paths. 3. The computer of claim 1 wherein in the at least one intermediate layer each processing node is connected to its neighbouring processing node by two intralayer links. 4. The computer of claim 1 wherein in the first and second endmost layers each processing node is connected to its neighbouring processing node by three intralayer links to enable the simultaneous transmission of data on three one dimensional paths in the configuration. 5. The computer of claim 1 which has been configured from a multi-face prism comprising a set of stacked layers, the processing nodes of each stacked layer having an interlayer link to a corresponding processing node in an adjacent stacked layer and an intralayer link between neighbouring processing nodes in the layer, by disconnecting each interlayer link in a designated stacked layer and connecting it to a neighbouring processing node in the designated stacked layer to provide an intralayer link whereby the designated stacked layer forms one of the first and second endmost layers. 6. The computer of claim 1 wherein each of the processing nodes is programmed to identify one of their interlayer and intralayer links to transmit data in order to determine the one-dimensional path for that data. 7. The computer of claim 1 wherein each of the processing nodes is programmed to deactivate any of its interlayer and intralayer links which are unused in a data transmission step. 8. The computer according to claim 1 wherein each processing node is programmed to divide a respective partial vector of that node into fragments and to transmit the data in the form of successive fragments around each one-dimensional path. 9. The computer according to claim 8 which is programmed to operate each path as a set of logical rings, wherein the successive fragments are transmitted around each logical ring in simultaneous transmission steps. 10. The computer according to claim 8 wherein each processing node is configured to output a respective fragment on each of two links simultaneously. 11. The computer according to claim 8 wherein each processing node is configured to reduce two incoming fragments with two respective corresponding locally stored fragments. 12. The computer according to claim 11 wherein each processing node is configured to transmit fully reduced fragments on each of two links simultaneously in an Allgather phase of an Allreduce collective. 13. The computer according to claim 1 where each link is bi-directional. 14. A method of generating a set of programs to be executed in parallel on a computer comprising a plurality of processing nodes, wherein the plurality of processing nodes are arranged in a configuration of multiple stacked layers which comprise a first endmost layer and a second endmost layer and at least one intermediate layer forming a multi-face prism; wherein each face of the multi-face prism comprises multiple stacked pairs of processing nodes, wherein the processing nodes of each pair are connected to each other by at least two interlayer links, and the processing node of each pair is connected to a corresponding processing node in an adjacent pair by at least one interlayer link wherein the corresponding processing nodes of the first endmost layer, the at least one intermediate layer and the second endmost layer are connected by respective interlayer links to form respective edges of the configuration; and wherein each pair of processing nodes forms part of one of the layers of the configuration, each layer comprising multiple processing nodes, each processing node connected to their neighbouring processing nodes in the layer by at least one of the intralayer links to form a ring; wherein method comprises: generating at least one data transmission instruction for each program to define a data transmission stage in which data is transmitted from the processing node executing that program, wherein the data transmission instruction comprises a link identifier which defines an outgoing link on which data is to be transmitted in that data transmission stage; and determining the link identifiers in order to transmit data around each of a plurality of one dimensional paths formed by respective sets of processing nodes and links, each one dimensional path having a first portion between the first endmost layer and the second endmost layer via the at least one intermediate layer using all processing nodes in one of the faces of the configuration only once, and a second portion provided between the second endmost layer and the first endmost layer and comprising one of the edges of the configuration. 15. The method according to claim 14 wherein each program comprises one or more instruction to deactivate any of its interlayer and intralayer links which are unused in a data transmission step. 16. The method according to claim 14 wherein each program comprises one or more instruction to divide a respective partial vector of the processing node on which that program is executed into fragments and to transmit the data in the form of successive fragments over the respectively defined link. 17. The method according to claim 16 wherein each program comprises one or more instruction to output a respective fragment on each of two links simultaneously. 18. The method according to claim 16 wherein each program comprises one or more instruction to reduce two incoming fragments with two respective corresponding locally stored fragments. 19. The method according to claim 18 wherein each program comprises one or more instruction to transmit fully reduced fragments on each of two links simultaneously in an Allgather phase of

Assignees

Graphcore Ltd

Inventors

Knowles Simon

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F15/17318Primary
Parallel communications techniques, e.g. gather, scatter, reduce, roadcast, multicast, all to all · CPC title
G06F15/8015Primary
One dimensional arrays, e.g. rings, linear arrays, buses · CPC title

Patent family

Related publications grouped by family.

View patent family 66381537

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720510B2 cover?: A computer comprising a plurality of interconnected processing nodes arranged in multiple stacked layers forming a multi-face prism is provided. Each face of the prism comprises multiple stacked pairs of nodes. Said nodes are connected by at least two intralayer links. Each node is connected to a corresponding node in an adjacent pair by an interlayer link. The corresponding nodes are connected…
Who is the assignee on this patent?: Graphcore Ltd
What technology area does this patent fall under?: Primary CPC classification G06F15/17318. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Synchronization in a multi-tile processing array

Instruction set

Compiler method

Collective communication operation

Parallel processing of reduction and broadcast operations on large datasets of non-scalar data

Frequently asked questions