What technology area does this patent fall under?

Primary CPC classification G06N3/042. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Sparse convolutional neural network accelerator

US10891538B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10891538-B2
Application number	US-201715659371-A
Country	US
Kind code	B2
Filing date	Jul 25, 2017
Priority date	Aug 11, 2016
Publication date	Jan 12, 2021
Grant date	Jan 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, by a parallel processing unit, a first instruction including a first index vector operand and a second index vector operand; decoding, by the parallel processing unit, the first index vector operand to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array; decoding, by the parallel processing unit, the second index vector operand to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array; summing, by the parallel processing unit, the first coordinate sets with the second coordinate sets to produce output coordinate sets; and converting, by the parallel processing unit, the output coordinate sets into a set of linear indices. 2. The method of claim 1 , wherein the summing comprises summing each first coordinate in the first coordinate sets with a third coordinate in the second coordinate sets to produce the coordinates of the output coordinate sets. 3. The method of claim 1 , wherein the first array is a three-dimensional array and the second array is a two-dimensional array. 4. The method of claim 1 , wherein the first array stores weight values and the second array stores activation values. 5. The method of claim 1 , wherein the first index vector encodes values that are accumulated in sequence to compute the first coordinates for each non-zero element in the first array. 6. The method of claim 1 , further comprising: receiving, by the parallel processing unit, a second instruction including a first set of linear addresses operand and a second scalar values operand; and summing, by the parallel processing unit, scalar values in the scalar values operand to values in a third array, each value in the third array corresponding to a linear address in the first set of linear addresses operand. 7. The method of claim 6 , wherein the first indices operand is the set of linear indices. 8. The method of claim 1 , further comprising: receiving a second instruction including a first non-zero elements vector operand and a second non-zero elements vector operand; and multiplying each one of the non-zero elements in the first non-zero values vector operand by every one of the non-zero elements in the second non-zero elements vector operand to produce a vector of products. 9. The method of claim 8 , wherein each of the non-zero elements in the first non-zero elements vector operand corresponds to an index in the first index vector operand and each of the non-zero elements in the second non-zero values vector operand corresponds to an index in the second index vector operand. 10. The method of claim 8 , further comprising: receiving, by the parallel processing unit, a third instruction including a first set of linear addresses operand and a second scalar values operand, wherein the first set of linear addresses operand is the set of linear addresses and the second scalar values operand is the vector of products; and summing, by the parallel processing unit, scalar values in the scalar values operand with partial sums in a third array, each partial sum in the third array corresponding to a linear address in the first set of linear addresses operand. 11. The method of claim 1 , further comprising, before receiving the first instruction: receiving, by the parallel processing unit, a second instruction including a first scalar values operand; generating a vector of non-zero elements including only values in the first scalar values operand that are not equal to zero; and generating a vector of indices comprising positions within the second array, wherein each index in the vector of indices is associated with a non-zero element in the vector of non-zero elements. 12. The method of claim 11 , wherein the second indices operand is the vector of indices. 13. A processor, comprising: parallel processing units configured to: receive a first instruction including a first index vector operand and a second index vector operand; decode the first index vector operand to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array; decode the second index vector operand to produce second coordinate sets for a second array, each second coordinate set including at least the first coordinate and the second coordinate of a position of a non-zero element in the second array; sum the first coordinate sets with the second coordinate sets to produce output coordinate sets; and convert the output coordinate sets into a set of linear indices. 14. The processor of claim 13 , wherein the parallel processing units are further configured to sum each first coordinate in the first coordinate sets with a third coordinate in the second coordinate sets to produce the coordinates of the output coordinate sets. 15. The processor of claim 13 , wherein the first index vector encodes values that are accumulated in sequence to compute the first coordinates for each non-zero element in the first array. 16. The processor of claim 13 , the parallel processing units are further configured to: receive a second instruction including a first set of linear addresses operand and a second scalar values operand; and sum scalar values in the scalar values operand to values in a third array, each value in the third array corresponding to a linear address in the first set of linear addresses operand. 17. The processor of claim 16 , wherein the first indices operand is the set of linear indices. 18. The processor of claim 13 , the parallel processing units are further configured to: receive a second instruction including a first non-zero elements vector operand and a second non-zero elements vector operand; and multiply each one of the non-zero elements in the first non-zero values vector operand by every one of the non-zero elements in the second non-zero elements vector operand to produce a vector of products. 19. The processor of claim 13 , the parallel processing units are further configured to, before receiving the first instruction: receive a second instruction including a first scalar values operand; generate a vector of non-zero elements including only values in the first scalar values operand that are not equal to zero; and generate a vector of indices comprising positions within the second array, wherein each index in the vector of indices is associated with a non-zero element in the vector of non-zero elements. 20. A non-transitory, computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform steps comprising: receiving a first instruction including a first index vector operand and a second index vector operand; decoding the first index vector operand to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array; decoding the second index vector operand to produce second coordinate sets for a second array, each second coordinate set including at least the first coordinate and the second coordinate of a position of a non-zero element in the second array; summing the first coordinate sets wit

Assignees

Nvidia Corp

Inventors

Classifications

G06N3/042Primary
Knowledge-based neural networks; Logical representations of neural networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/082
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

Patent family

Related publications grouped by family.

View patent family 61159167

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10891538B2 cover?: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a …
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/042. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Accelerator for deep neural networks

Neural network processor

Convolutional neural networks on hardware accelerators

Frequently asked questions