What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data structure descriptors for deep learning acceleration

US10726329B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10726329-B2
Application number	US-201816089261-A
Country	US
Kind code	B2
Filing date	Apr 17, 2018
Priority date	Apr 17, 2017
Publication date	Jul 28, 2020
Grant date	Jul 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

First claim

Opening claim text (preview).

What is claimed is: 1. A compute element comprising: a memory; means for decoding an instruction, the instruction comprising an operand field; means for accessing an operand descriptor based at least in part on the operand field; means for decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; means for accessing an operand in accordance with the operand descriptor and the particular type; means for performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via the memory; and wherein execution of the instruction implements at least a portion of one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 2. A method comprising: in a compute element, decoding an instruction, the instruction comprising an operand field; in the compute element, accessing an operand descriptor based at least in part on the operand field; in the compute element, decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; in the compute element, accessing an operand in accordance with the operand descriptor and the particular type; performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via a memory of the compute element; and wherein execution of the instruction implements at least a portion of one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 3. A method comprising: in a compute element, decoding an instruction, the instruction comprising an operand field; in the compute element, accessing an operand descriptor based at least in part on the operand field; in the compute element, decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; in the compute element, accessing an operand in accordance with the operand descriptor and the particular type; performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via a memory of the compute element; and wherein the operand comprises at least a portion of one or more of: a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. 4. A system comprising: a fabric of processing elements, each processing element comprising a fabric router coupled to a compute element, the fabric of processing elements enabled to perform dataflow-based processing and instruction-based processing, the fabric of processing elements implemented via wafer-scale integration; wherein each processing element is enabled to selectively communicate fabric packets with others of the processing elements at least in part via the fabric router of the respective processing element; wherein each compute element comprises a memory and is enabled to decode an instruction, the instruction comprising an operand field, access an operand descriptor based at least in part on the operand field, decode the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to, the plurality of types comprising a fabric type and a memory type, access an operand in accordance with the operand descriptor and the particular type, wherein the access of the operand is via the respective fabric router coupled to the compute element when the particular type is the fabric type, and wherein the access of the operand is via the memory when the particular type is the memory type; wherein the operand descriptor identifies an access pattern as one of a one-dimensional memory vector access pattern, a four-dimensional memory vector access pattern, and a circular memory buffer access pattern; wherein the operand descriptor is enabled to specify one of a plurality of extended operand descriptors; and wherein the extended operand descriptors are enabled to specify one or more of stride information and dimension information of a four-dimensional memory vector. 5. A system comprising: a fabric of processing elements, each processing element comprising a fabric router coupled to a compute element, the fabric of processing elements enabled to perform dataflow-based processing and instruction-based processing, the fabric of processing elements implemented via wafer-scale integration; wherein each processing element is enabled to selectively communicate fabric packets with others of the processing elements at least in part via the fabric router of the respective processing element; wherein each compute element comprises a memory and is enabled to decode an instruction, the instruction comprising an operand field, access an operand descriptor based at least in

Assignees

Cerebras Systems Inc

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title
G06N3/048
Activation functions · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 63855635

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10726329B2 cover?: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data st…
Who is the assignee on this patent?: Cerebras Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).