Processor element redundancy for accelerated deep learning

US11328208B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11328208-B2
Application numberUS-201917272141-A
CountryUS
Kind codeB2
Filing dateAug 27, 2019
Priority dateAug 29, 2018
Publication dateMay 10, 2022
Grant dateMay 10, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of cost, accuracy, performance, and energy efficiency. The deep learning accelerator is implemented at least in part via wafer-scale integration. The wafer comprises a plurality of processor elements, each augmented with redundancy-enabling couplings. The redundancy-enabling couplings enable using redundant ones of the processor elements to replace defective ones of the processor elements. Defect information gathered at wafer test and/or in-situ, such as in a datacenter, is used to determine configuration information for the redundancy-enabling couplings.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: in a regular grid of physical processing elements arranged in at least two dimensions, first communicating in a first of the at least two dimensions between physically adjacent physical processing elements of the regular grid; in the regular grid, second communicating in a second of the at least two dimensions between physically adjacent physical processing elements of the regular grid; in the regular grid, third communicating in the first dimension between physically proximate elements of the regular grid, wherein the physically proximate elements are at least one physical processing element distant from each other with respect to the regular grid; in the regular grid, fourth communicating in at least the second dimension between physical processing elements of the regular grid; in the regular grid, fifth communicating in at least the second dimension between physical processing elements of the regular grid; operating the regular grid as a logical array of functional processing elements in view of defective ones of the physical processing elements by controlling selected aspects of the third communicating to provide communication in the first dimension instead of selected aspects of the first communicating, and further by selectively controlling aspects of the fourth communicating and aspects of the fifth communicating to provide communication in at least the second dimension instead of selected aspects of the second communicating; wherein the regular grid, the first through the fifth acts of communicating, and the operating are compatible with wafer-scale integration; wherein the operating is based in part on configuration information determined at least in part by testing a wafer comprising the physical processing elements to determine which of the physical processing elements are the defective physical processing elements; further comprising fabricating the wafer, in accordance with results of the testing, determining a topology of a usable logical array of processing elements realizable from the fabricated wafer, in accordance with the topology, configuring a system comprising the fabricated wafer, booting the system in accordance with the configuring, and executing one or more applications on the system; wherein the system is a deep learning accelerator and the one or more applications are one or more deep learning applications; wherein the usable logical array is a uniform logical array of M rows by N columns; and wherein the determining the topology comprises treating at least some non-defective ones of the physical processing elements in at least one of a same one of the rows and a same one of the columns as one of the defective physical processing elements as if the at least some of the non-defective physical processing elements were defective. 2. The method of claim 1 , wherein the physically proximate elements are separated by one physical processing element from each other with respect to the regular grid. 3. The method of claim 1 , wherein the first communicating and the third communicating have a same latency. 4. The method of claim 1 , wherein the second communicating, the fourth communicating, and the fifth communicating have a same latency. 5. The method of claim 1 , wherein the operating seeks, with respect to the logical array of functional processing elements, to replace the defective physical processing elements with respective ones of the physical processing elements that are not defective. 6. The method of claim 1 , wherein the physical processing elements are arranged in a rectangle circumscribed within a wafer comprising the physical processing elements. 7. The method of claim 1 , wherein each column is characterized by a same number of defective physical processing elements plus non-defective physical processing elements treated as defective. 8. The method of claim 1 , wherein with respect to a particular one of the rows, consecutive ones of the columns have one of the non-defective physical processing elements not treated as defective in either the particular row or one of the rows contiguous with the particular row. 9. The method of claim 1 , wherein no two adjacent ones of the rows have one of the defective physical processing elements or one of the physical processing elements treated as defective in a same one of the columns. 10. The method of claim 1 , wherein the determining the topology comprises partitioning a reticle of the fabricated wafer into sub-sections for independent analysis. 11. A system comprising: first means for communicating in a first of at least two dimensions between physically adjacent physical processing elements of a regular grid of physical processing elements arranged in the at least two dimensions; second means for communicating in a second of the at least two dimensions between physically adjacent physical processing elements of the regular grid; third means for communicating in the first dimension between physically proximate elements of the regular grid, wherein the physically proximate elements are at least one physical processing element distant from each other with respect to the regular grid; fourth means for communicating in at least the second dimension between physical processing elements of the regular grid; fifth means for communicating in at least the second dimension between physical processing elements of the regular grid; means for operating the regular grid as a logical array of functional processing elements in view of defective ones of the physical processing elements by controlling selected aspects of the third means for communicating to provide communication in the first dimension instead of selected aspects of the first means for communicating, and further by selectively controlling aspects of the fourth means for communicating and aspects of the fifth means for communicating to provide communication in at least the second dimension instead of selected aspects of the second means for communicating; wherein the regular grid, the first through the fifth means for communicating, and the means for operating are compatible with wafer-scale integration; wherein the means for operating is operable based in part on configuration information determined at least in part by testing a wafer comprising the physical processing elements to determine which of the physical processing elements are the defective physical processing elements; further comprising operable in accordance with results of the testing, means for determining a topology of a usable logical array of processing elements realizable from the tested wafer, operable in accordance with the topology, means for configuring an accelerator comprising the tested wafer, means for booting the accelerator in accordance with the means for configuring, and means for executing one or more applications on the accelerator; wherein the accelerator is a deep learning accelerator and the one or more applications are one or more deep learning applications; wherein the usable logical array is a uniform logical array of M rows by N columns; and wherein the means for determining the topology comprises means for treating at least some non-defective ones of the physical processing elements in at least one of a same one of the rows and a same one of the columns as one of the defective physical processing elements as if the at least some of the non-defective physical processing elements were defective. 12. The system of claim 11 , wherein each column is characterized by a same number of defective physical processing elements plus non-defective physical processing elements treated as defective.

Assignees

Inventors

Classifications

  • Probabilistic or stochastic networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11328208B2 cover?
Techniques in advanced deep learning provide improvements in one or more of cost, accuracy, performance, and energy efficiency. The deep learning accelerator is implemented at least in part via wafer-scale integration. The wafer comprises a plurality of processor elements, each augmented with redundancy-enabling couplings. The redundancy-enabling couplings enable using redundant ones of the pro…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).