Processing apparatus and processing method
US-2020050918-A1 · Feb 13, 2020 · US
US11275713B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11275713-B2 |
| Application number | US-201816004358-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 9, 2018 |
| Priority date | Jun 9, 2018 |
| Publication date | Mar 15, 2022 |
| Grant date | Mar 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention is notably directed to a computing system configured to perform linear algebraic operations. The computing system comprises a co-processing module comprising a co-processing unit. The co-processing unit comprises a parallel array of bit-serial processing units. The bit-serial processing units are adapted to perform the linear algebraic operations with variable precision. The invention further concerns a related computer implemented method and a related computer program product.
Opening claim text (preview).
What is claimed is: 1. A computing system configured to perform linear algebraic operations, the computing system comprising: a co-processing module comprising a co-processing unit, the co-processing unit comprising a parallel array of bit-serial processing units, the bit-serial processing units being adapted to perform the linear algebraic operations with variable precision, wherein the co-processing module comprises a local memory, the local memory comprising a bit-level memory layout; and a host unit comprising a main memory; a central processing unit; and an offload engine adapted to configure the co-processing module for a subsequent data transfer between the main memory and the local memory; wherein: each of the bit-serial processing units comprises: a bit-serial multiplier; and a bit-serial adder; and each of the bit-serial multipliers comprises: first and second input lines, a control line, and a carry over line; a plurality of stages, each of said stages in turn comprising: an AND gate having first and second inputs coupled to the first and second input lines and an output; a full adder including a first input coupled to said output of said AND gate, a second input, and an output; and a flip-flop having an input coupled to said output of said full adder and having a stage output port; wherein, for a first one of said plurality of stages, said second input of said full adder is coupled to said carry over line and for each of said plurality of stages other than said first one of said plurality of stages, said second input is coupled to said stage output port for a previous one of said stages; and a bypass logic configured to support input data of variable precision, said bypass logic comprising: a first multiplexer configured to read out multiplication results of one-bit precision from said stage output port of said first one of said plurality of stages, results of maximum precision from said stage output port of a last one of said plurality of stages, and results of corresponding intermediate precision from one or more of said stage output ports of one or more intermediate ones of said plurality of stages; and a second multiplexer configured to read out intermediate results of a control bit from said control line. 2. The computing system according to claim 1 , wherein the bit-serial multipliers are configurable by software to perform the linear algebraic operations in variable precisions from 1-bit to k-bit, wherein k is the maximum precision of the bit-serial multiplier, the software comprising an application program carrying out linear algebraic computations. 3. The computing system according to claim 1 , wherein the bypass logic is configured to use power gating or clock gating to deactivate unused stages of the bit-serial multiplier. 4. The computing system according to claim 1 , wherein the parallel array of bit-serial processing units is a 2-dimensional systolic array having a plurality of rows and columns and configured to receive staggered bits of rows of a first matrix and staggered bits of columns of a second matrix. 5. The computing system according to claim 1 , wherein the linear algebraic operations are those of a deep neural network application.
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
in bit-serial fashion, i.e. having a single digit-handling circuit treating all denominations after each other · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
in serial-parallel fashion, i.e. one operand being entered serially and the other in parallel (G06F7/533 takes precedence) · CPC title
Systolic arrays · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.