What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Processor element redundancy for accelerated deep learning

US11328208B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11328208-B2
Application number	US-201917272141-A
Country	US
Kind code	B2
Filing date	Aug 27, 2019
Priority date	Aug 29, 2018
Publication date	May 10, 2022
Grant date	May 10, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of cost, accuracy, performance, and energy efficiency. The deep learning accelerator is implemented at least in part via wafer-scale integration. The wafer comprises a plurality of processor elements, each augmented with redundancy-enabling couplings. The redundancy-enabling couplings enable using redundant ones of the processor elements to replace defective ones of the processor elements. Defect information gathered at wafer test and/or in-situ, such as in a datacenter, is used to determine configuration information for the redundancy-enabling couplings.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: in a regular grid of physical processing elements arranged in at least two dimensions, first communicating in a first of the at least two dimensions between physically adjacent physical processing elements of the regular grid; in the regular grid, second communicating in a second of the at least two dimensions between physically adjacent physical processing elements of the regular grid; in the regular grid, third communicating in the first dimension between physically proximate elements of the regular grid, wherein the physically proximate elements are at least one physical processing element distant from each other with respect to the regular grid; in the regular grid, fourth communicating in at least the second dimension between physical processing elements of the regular grid; in the regular grid, fifth communicating in at least the second dimension between physical processing elements of the regular grid; operating the regular grid as a logical array of functional processing elements in view of defective ones of the physical processing elements by controlling selected aspects of the third communicating to provide communication in the first dimension instead of selected aspects of the first communicating, and further by selectively controlling aspects of the fourth communicating and aspects of the fifth communicating to provide communication in at least the second dimension instead of selected aspects of the second communicating; wherein the regular grid, the first through the fifth acts of communicating, and the operating are compatible with wafer-scale integration; wherein the operating is based in part on configuration information determined at least in part by testing a wafer comprising the physical processing elements to determine which of the physical processing elements are the defective physical processing elements; further comprising fabricating the wafer, in accordance with results of the testing, determining a topology of a usable logical array of processing elements realizable from the fabricated wafer, in accordance with the topology, configuring a system comprising the fabricated wafer, booting the system in accordance with the configuring, and executing one or more applications on the system; wherein the system is a deep learning accelerator and the one or more applications are one or more deep learning applications; wherein the usable logical array is a uniform logical array of M rows by N columns; and wherein the determining the topology comprises treating at least some non-defective ones of the physical processing elements in at least one of a same one of the rows and a same one of the columns as one of the defective physical processing elements as if the at least some of the non-defective physical processing elements were defective. 2. The method of claim 1 , wherein the physically proximate elements are separated by one physical processing element from each other with respect to the regular grid. 3. The method of claim 1 , wherein the first communicating and the third communicating have a same latency. 4. The method of claim 1 , wherein the second communicating, the fourth communicating, and the fifth communicating have a same latency. 5. The method of claim 1 , wherein the operating seeks, with respect to the logical array of functional processing elements, to replace the defective physical processing elements with respective ones of the physical processing elements that are not defective. 6. The method of claim 1 , wherein the physical processing elements are arranged in a rectangle circumscribed within a wafer comprising the physical processing elements. 7. The method of claim 1 , wherein each column is characterized by a same number of defective physical processing elements plus non-defective physical processing elements treated as defective. 8. The method of claim 1 , wherein with respect to a particular one of the rows, consecutive ones of the columns have one of the non-defective physical processing elements not treated as defective in either the particular row or one of the rows contiguous with the particular row. 9. The method of claim 1 , wherein no two adjacent ones of the rows have one of the defective physical processing elements or one of the physical processing elements treated as defective in a same one of the columns. 10. The method of claim 1 , wherein the determining the topology comprises partitioning a reticle of the fabricated wafer into sub-sections for independent analysis. 11. A system comprising: first means for communicating in a first of at least two dimensions between physically adjacent physical processing elements of a regular grid of physical processing elements arranged in the at least two dimensions; second means for communicating in a second of the at least two dimensions between physically adjacent physical processing elements of the regular grid; third means for communicating in the first dimension between physically proximate elements of the regular grid, wherein the physically proximate elements are at least one physical processing element distant from each other with respect to the regular grid; fourth means for communicating in at least the second dimension between physical processing elements of the regular grid; fifth means for communicating in at least the second dimension between physical processing elements of the regular grid; means for operating the regular grid as a logical array of functional processing elements in view of defective ones of the physical processing elements by controlling selected aspects of the third means for communicating to provide communication in the first dimension instead of selected aspects of the first means for communicating, and further by selectively controlling aspects of the fourth means for communicating and aspects of the fifth means for communicating to provide communication in at least the second dimension instead of selected aspects of the second means for communicating; wherein the regular grid, the first through the fifth means for communicating, and the means for operating are compatible with wafer-scale integration; wherein the means for operating is operable based in part on configuration information determined at least in part by testing a wafer comprising the physical processing elements to determine which of the physical processing elements are the defective physical processing elements; further comprising operable in accordance with results of the testing, means for determining a topology of a usable logical array of processing elements realizable from the tested wafer, operable in accordance with the topology, means for configuring an accelerator comprising the tested wafer, means for booting the accelerator in accordance with the means for configuring, and means for executing one or more applications on the accelerator; wherein the accelerator is a deep learning accelerator and the one or more applications are one or more deep learning applications; wherein the usable logical array is a uniform logical array of M rows by N columns; and wherein the means for determining the topology comprises means for treating at least some non-defective ones of the physical processing elements in at least one of a same one of the rows and a same one of the columns as one of the defective physical processing elements as if the at least some of the non-defective physical processing elements were defective. 12. The system of claim 11 , wherein each column is characterized by a same number of defective physical processing elements plus non-defective physical processing elements treated as defective.

Assignees

Cerebras Systems Inc

Inventors

Classifications

G06N3/047
Probabilistic or stochastic networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 69645080

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11328208B2 cover?: Techniques in advanced deep learning provide improvements in one or more of cost, accuracy, performance, and energy efficiency. The deep learning accelerator is implemented at least in part via wafer-scale integration. The wafer comprises a plurality of processor elements, each augmented with redundancy-enabling couplings. The redundancy-enabling couplings enable using redundant ones of the pro…
Who is the assignee on this patent?: Cerebras Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).