High-speed multi-block-row layered decoder for low density parity check (ldpc) codes
US-2015301887-A1 · Oct 22, 2015 · US
US9886377B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9886377-B2 |
| Application number | US-201514874784-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 5, 2015 |
| Priority date | Oct 5, 2015 |
| Publication date | Feb 6, 2018 |
| Grant date | Feb 6, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.
Opening claim text (preview).
The invention claimed is: 1. One or more integrated circuits (ICs) comprising: controller circuitry to: receive a command to execute an operation for a plurality of data inputs stored in an external memory or a local memory; and convert the operation into a set of matrix operations, wherein the set of matrix operations are to each operate on respective sub-portions of the plurality of data inputs; and at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include: a plurality of arithmetic logic units (ALUs); a local memory external to the ALUs and accessible by the ALUs; and processing control circuitry to: create a plurality of matrix operands in the local memory from the plurality of data inputs of the operation, wherein each of the plurality of matrix operands respectively comprises one of a scalar, a vector, or a two-dimensional (2D) matrix; and provide a plurality of memory handles to the plurality of ALUs, wherein each of the memory handles corresponds to a respective one of the matrix operands, and the plurality of ALUs are to access the respective matrix operands using the memory handles in association with executing the matrix operations. 2. The one or more ICs of claim 1 , wherein the processing control circuitry of the processing circuitry is to further store the output of one of the ALUs in the local memory of the processing circuitry. 3. The one or more ICs of claim 2 , wherein the processing control circuitry comprises a plurality of pipeline stages configured to execute operations to create matrix operands, provide memory handles, and store the output of the ALUs substantially in parallel. 4. The one or more ICs of claim 1 , wherein the processing control circuitry is to further: create matrix operands by loading data from the data inputs stored in the external memory into memory rows of the local memory; and overwrite a memory row in response to completion of a matrix operation. 5. The one or more ICs of claim 1 , wherein the processing control circuitry is to further: identify matrix operations corresponding to an operation that can be executed in parallel by the ALUs of the processing circuitry; and fetch non-contiguous data from the plurality of data inputs of the operation stored in the external memory to be stored contiguously in the local memory for the processing control circuitry to create matrix operands for parallel execution of matrix operations. 6. The one or more ICs of claim 5 , wherein the processing control circuitry is to further: ensure the local memory of the processing circuitry includes only data accessed by the processing control circuitry or the ALUs during parallel execution of matrix operations. 7. The one or more ICs of claim 1 , wherein the operation comprises a convolution operation, the plurality of inputs comprises image data, one or more filters, or index data, and the at least one matrix operand comprises a first matrix operand comprising data from the image data and a second matrix operand comprising data from the one or more filters or the index data. 8. The one or more ICs of claim 7 , wherein the convolution operation comprises a strided convolution operation, and the processing control circuitry is to further: create a first matrix operand from the image data according to a stride value of the strided convolution operation. 9. The one or more ICs of claim 1 , wherein the operation comprises at least one of a linear contrast operation, a local response normalization operation, or a max pooling operation. 10. The one or more ICs of claim 1 , wherein the processing control circuitry is to further: provide an output of the ALUs to another processing circuitry. 11. The one or more ICs of claim 10 , wherein the processing control circuitry is to further: identify an output of an ALU as a partial product of a matrix multiplication operation; and provide the partial product output to another ALU for adding to partial products generated by one or more other ALUs or store the partial product in the external memory for subsequent addition with other partial products. 12. The one or more ICs of claim 1 , wherein the processing control circuitry is to further: write-out an output of the ALUs to a data output object stored in the external memory. 13. The one or more ICs of claim 12 , wherein the operation comprises a backpropagation operation, the data inputs of the backpropagation operation include a set of generated output values and a set of expected output values, and the processing control circuitry is to further: write-out an output of the ALUs to a sequence of weight values stored in the external memory. 14. The one or more ICs of claim 13 , wherein the processing control circuitry is to further: execute matrix operations comprising operands with sub-patterns of zeros by executing them as matrix operations with smaller operands that do not contain the sub-patterns of zeros. 15. The one or more ICs of claim 1 , wherein the controller circuitry is to further: convert the operation into a set of matrix operations that operate on at least some non-contiguous or overlapping sub-portions of the plurality of data inputs. 16. The one or more ICs of claim 1 , wherein the processing circuitry is to further: bypass the ALUs and execute some of the operations. 17. A system comprising: a host processor; a host memory; an input/output (I/O) interface; a memory separate from the host memory; and one or more integrated circuits (ICs) comprising: controller circuitry to: receive a command to execute an operation for a plurality of data inputs stored in an external memory or a local memory; and convert the operation into a set of matrix operations, wherein the set of matrix operations are to each operate on respective sub-portions of the plurality of data inputs; and at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include: a plurality of arithmetic logic units (ALUs); a local memory external to the ALUs and accessible by the ALUs; and processing control circuitry to: create plurality of matrix operands in the local memory from the plurality of data inputs of the operation, wherein each of the plurality of matrix operands respectively comprises one of a scalar, a vector, or a two-dimensional (2D) matrix; and provide a plurality of memory handles to the plurality of ALUs, wherein each of the memory handles corresponds to a respective one of the matrix operands, and the plurality of ALUs are to access the respective matrix operands using the memory handles in association with executing the matrix operations. 18. The system of claim 17 , wherein the host processor, the memory separate from the host memory, and the one or more ICs are included in a self-hosting device. 19. The system of claim 17 , wherein the host processor is to further execute a neural network machine learning module. 20. The system of claim 17 , wherein the one or more ICs are included in one of a plurality of peripheral apparatuses included in the system, and further comprise: one or more inter-chip interfaces for coupling to one or more other peripheral apparatuses included in the system; wherein the peripheral apparatuses included in the system are interconnected in a multi-dimensional array.
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Architectures of general purpose stored program computers (with program plugboard G06F15/08; multicomputers G06F15/16) · CPC title
Free address space management · CPC title
Local memory within processor subsystem · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.