Computational array microprocessor system with variable latency memory access
US-2019026237-A1 · Jan 24, 2019 · US
US11720523B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11720523-B2 |
| Application number | US-201916653578-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 15, 2019 |
| Priority date | Jan 31, 2018 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processing element (PE) of a systolic array can perform neural networks computations on two or more data elements of an input data set using the same weight. Thus, two or more output data elements corresponding to an output data set may be generated. Based on the size of the input data set and an input data type, the systolic array can process a single data element or multiple data elements in parallel.
Opening claim text (preview).
What is claimed is: 1. An integrated circuit device comprising: a state buffer configured to provide row data and a weight value; a processing element (PE) including: a row input interface configured to concurrently receive a first X-in element and a second X-in element of a row input based on the row data, wherein the row input interface includes a first row input port for receiving the first X-in element and a second row input port for receiving the second X-in element; a column input interface configured to receive a column input including a first Y-in element and a second Y-in element; a row output interface including a first row output port for outputting the first X-in element as a first X-out element and a second row output port for outputting the second X-in element as a second X-out element; and a column output interface configured to provide a column output including a first Y-out element and a second Y-out element, the column output computed from the row input, the column input, and the weight value; an output buffer configured to store a computational result derived from the column output; and an activation engine configured to apply a function to the computational result and store an output of the function in the state buffer. 2. The integrated circuit device of claim 1 , wherein the column input is provided from another PE. 3. The integrated circuit device of claim 1 , further comprising another PE configured to concurrently receive the first X-out element and the second X-out element of the row output from the row output interface. 4. The integrated circuit device of claim 1 , further comprising another PE configured to receive the first Y-out element and the second Y-out element of the column output from the column output interface. 5. The integrated circuit device of claim 1 , wherein the function that the activation engine is configured to apply to the computational result is one of a bypass function or a ReLU function. 6. A processing element (PE) comprising: a first interface configured to concurrently receive a first X-in element and a second X-in element, wherein the first interface includes a first row input port for receiving the first X-in element and a second row input port for receiving the second X-in element; a second interface configured to receive a first Y-in element and a second Y-in element; a third interface configured to output the first X-in element as a first X-out element and the second X-in element as a second X-out element; a fourth interface configured to output a first Y-out element and a second Y-out element; wherein the PE is configured to: perform a first computational operation on the first X-in element and a weight value to generate a first intermediate result, and on the second X-in element and the weight value to generate a second intermediate result; and perform a second computational operation on the first intermediate result and the first Y-in element to generate the first Y-out element, and on the second intermediate result and the second Y-in element to generate the second Y-out element. 7. The processing element of claim 6 , wherein the first Y-out element and the second Y-out element are provided as Y-in elements to another PE. 8. The processing element of claim 6 , wherein the first interface is configured to receive the first X-in element and the second X-in element from another PE. 9. The processing element of claim 6 , wherein the second interface is configured to receive the first Y-in element and the second Y-in element from another PE. 10. The processing element of claim 6 , wherein the first interface is coupled to a data path that receives the first Y-out element and the second Y-out element. 11. A method comprising: concurrently receiving, by a processing element (PE), a first X-in element and a second X-in element along a row datapath, wherein the first X-in element is received at a first row input port of the PE and the second X-in element is received at a second row input port of the PE; receiving, by the PE, a first Y-in element and a second Y-in element along a column datapath; outputting, by the PE, the first X-in element as a first X-out element and the second X-in element as a second X-out element, wherein the first X-out element is outputted at a first row output port of the PE and the second X-out element is outputted at a second row output port of the PE; performing, by the PE, a first computational operation on the first X-in element with a weight value to generate a first result, and on the second X-in element with the weight value to generate a second result; performing, by the PE, a second computational operation on the first result with the first Y-in element to generate a first Y-out element, and on the second result with the second Y-in element to generate a second Y-out element; and outputting the first Y-out element and the second Y-out element along the column datapath. 12. The method of claim 11 , wherein the first Y-out element and the second Y-out element are outputted as Y-in elements to another PE. 13. The method of claim 11 , wherein the first X-in element and the second X-in element are received from another PE. 14. The method of claim 11 , wherein the first Y-in element and the second Y-in element are received from another PE. 15. The method of claim 11 , wherein the first X-in element and the second X-in element are outputted to another PE. 16. The method of claim 11 , further comprising: applying a function to the first Y-out element and the second Y-out element; and providing a result of the function to the row datapath.
Convolutional networks [CNN, ConvNet] · CPC title
Systolic arrays · CPC title
using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake · CPC title
Correlation function computation {including computation of convolution operations (arithmetic circuits for sum of products per se, e.g. multiply-accumulators G06F7/5443; digital filters, e.g. FIR, IIR, adaptive filters H03H17/00)} · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.