What technology area does this patent fall under?

Primary CPC classification G06F9/3001. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 14 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Matrix multiplication acceleration of sparse matrices using column folding and squeezing

US10620951B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10620951-B2
Application number	US-201816016278-A
Country	US
Kind code	B2
Filing date	Jun 22, 2018
Priority date	Jun 22, 2018
Publication date	Apr 14, 2020
Grant date	Apr 14, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the element and each corresponding non-zero element at row K, column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the specified output matrix.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: fetch and decode circuitry to fetch and decode a sparse matrix multiplication (SMM) instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, the fetch circuitry further to fetch and store elements of the first and second matrices into a register file; and execution circuitry, responsive to the SMM instruction, to pack the second matrix stored in the register file by replacing one or more zero-valued elements in a column N of the second matrix with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and the execution circuitry further to generate a product of a non-zero element at column K of the first matrix and a corresponding non-zero element at row K of the packed second matrix for each non-zero element in row M of the first matrix and each non-zero element in column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the output matrix. 2. The processor of claim 1 , wherein the execution circuitry, for each row K of the second matrix, is to determine whether the row contains any zero-valued elements, and, if so, determine whether the row contains any non-zero elements yet to be processed from the zero-valued element, and, if so, for each zero-valued element having a non-zero element yet to be processed, fold the non-zero element into the zero-valued element. 3. The processor of claim 1 , wherein the execution circuitry, for each column N of the second matrix, is to determine whether the column contains any zero-valued elements, and, if so, determine whether any of P elements of a subsequent column is a non-zero value, and, for each zero-valued element having a non-zero element in the subsequent column, squeeze the non-zero element into the zero-valued element. 4. The processor of claim 1 , wherein each of the elements stored in the register file includes a field to specify whether it has a zero value, and wherein the execution circuitry is to use the field when determining whether the element has a zero value. 5. The processor of claim 1 , wherein the execution circuitry is to avoid generating any products of elements having zero values. 6. The processor of claim 1 , wherein the execution circuitry is to comprise a processing array of (X×Y) processing units, wherein X is less than M and Y is less than N, the execution circuitry to use the processing array iteratively over a plurality of clock cycles to perform the same processing as an actual, physical array of (M×N) processing units. 7. The processor of claim 1 , wherein the execution circuitry is to comprise a processing array of (X×Y) processing units, wherein X is less than M and Y is less than N, the execution circuitry to cascade a plurality of instances of the processing array to perform the same processing as an actual, physical array of (M×N) processing units. 8. A method executed by a processor, the method comprising: fetching and decoding, using fetch and decode circuitry, a sparse matrix multiplication (SMM) instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, the fetch circuitry further fetching and storing elements of the first and second matrices into a register file; and responding, using execution circuitry, to the SMM instruction by packing the second matrix stored in the register file by replacing one or more zero-valued elements in a column N of the second matrix with non-zero elements yet to be processed, each replaced element to include a field to identify its logical position within the second matrix, the execution circuitry further to generate a product of a non-zero element at column K of the first matrix and a corresponding non-zero element at row K of the packed second matrix for each non-zero element in row M of the first matrix and each non-zero element in column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the output matrix. 9. The method of claim 8 , wherein the execution circuitry, for each row K of the second matrix, is to determine whether the row contains any zero-valued elements, and, if so, determine whether the row contains any non-zero elements yet to be processed from the zero-valued element, and, if so, for each zero-valued element having a non-zero element yet to be processed, fold the non-zero element into the zero-valued element. 10. The method of claim 8 , wherein the execution circuitry, for each column N of the second matrix, is to determine whether the column contains any zero-valued elements, and, if so, determine whether any of P elements of a subsequent column is a non-zero value, and, for each zero-valued element having a non-zero element in the subsequent column, squeeze the non-zero element into the zero-valued element. 11. The method of claim 8 , wherein each of the elements stored in the register file includes a field to specify whether it has a zero value, and wherein the execution circuitry is to use the field when determining whether the element has a zero value. 12. The method of claim 8 , wherein the execution circuitry is to avoid generating any products of elements having zero values. 13. The method of claim 8 , wherein the execution circuitry comprises a processing array of (X×Y) processing units, X being less than M and Y being less than N, the execution circuitry using the processing array iteratively over a plurality of clock cycles to perform the same processing as an actual, physical array of (M×N) processing units. 14. The method of claim 8 , wherein the execution circuitry comprises a processing array of (X×Y) processing units, X being less than M and Y being less than N, the execution circuitry cascading a plurality of instances of the processing array to perform the same processing as an actual, physical array of (M×N) processing units. 15. A system comprising a memory and a processor, the processor comprising: fetch and decode circuitry to fetch and decode a sparse matrix multiplication (SMM) instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, the fetch circuitry further to fetch and store elements of the first and second matrices into a register file; and execution circuitry, responsive to the SMM instruction, to pack the second matrix stored in the register file by replacing one or more zero-valued elements in a column N of the second matrix with non-zero elements yet to be processed, each replaced element to include a field to identify its logical position within the second matrix, the execution circuitry further to generate a product of a non-zero element at column K of the first matrix and a corresponding non-zero element at row K of the packed second matrix for each non-zero element in row M of the first matrix and each non-zero element in column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the output matrix. 16. The system of claim 15 , wherein the execution circuitry, for each row K of the second matrix, is to determine whether the row contains any zero-valued elements, and, if so, determine whether the row contains any non-zero elements yet to be processed from the zero-valued element, and, if so, for each zero-valued element having a n

Assignees

Intel Corp

Inventors

Classifications

G06N3/08
Learning methods · CPC title
G06F9/3001Primary
Arithmetic instructions · CPC title
G06F9/3016
Decoding the operand specifier, e.g. specifier format · CPC title
G06F17/16
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
G06F9/3802
Instruction prefetching · CPC title

Patent family

Related publications grouped by family.

View patent family 65229528

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10620951B2 cover?: Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with no…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 14 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Sparse matrix storage in a database

Accelerator for sparse-dense matrix multiplication

Sparse data set processing

Sparse matrix data structure

Frequently asked questions