Sparse convolutional neural network accelerator

US10860922B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10860922-B2
Application numberUS-201916686931-A
CountryUS
Kind codeB2
Filing dateNov 18, 2019
Priority dateAug 11, 2016
Publication dateDec 8, 2020
Grant dateDec 8, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving a first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a first space; receiving a second vector comprising only non-zero input activation values and second associated positions within a second space; multiplying, within a multiplier array, the non-zero weight values with the non-zero activation values to produce a third vector of products; combining the first associated positions with the second associated positions to produce a fourth vector of positions, wherein each position in the fourth vector is associated with a respective product in the third vector; and transmitting the third vector to an accumulator array, wherein each one of the products in the third vector is transmitted to an adder in the accumulator array that is configured to generate an output activation value at the position associated with the product. 2. The method of claim 1 , further comprising: receiving a fifth vector comprising only additional non-zero weight values and fifth associated positions of the additional non-zero weight values within the first space; multiplying, within the multiplier array, the additional non-zero weight values with the non-zero activation values to produce a seventh vector of products; producing an eighth vector of positions, wherein each position in the eighth vector is associated with a respective product in the seventh vector of products; and for each matching position in the fourth vector and the eighth vector, summing the respective products in the third vector and the seventh vector by the accumulator array to produce partial sums. 3. The method of claim 1 , wherein the first associated positions are defined by coordinates in a first dimension and a second dimension and the second associated positions are defined by coordinates in the first dimension and the second dimension. 4. The method of claim 1 , further comprising transmitting the third vector through an array of buffers in the accumulator array, wherein each one of the buffers is coupled to an input of one of the adders in the accumulator array. 5. The method of claim 1 , further comprising compressing the output activation values to produce a set of vectors comprising non-zero output activation values including only the output activation values that are not equal to zero. 6. The method of claim 5 , wherein the set of vectors further comprises positions associated with the non-zero output activation values. 7. The method of claim 1 , wherein the second vector was generated during processing of a first layer of a neural network and the seventh vector of products is generated during processing of a second layer of the neural network. 8. The method of claim 1 , further comprising transmitting a first product in the third vector from a first accumulator entry in the accumulator array to a first adder in the accumulator array, wherein the first product is associated with a first position along an edge of the second space. 9. The method of claim 1 , wherein the combining comprises performing a vector addition to sum coordinates of the first associated positions with coordinates of the second associated positions to produce the fourth vector of positions, wherein each position in the fourth vector is associated with a respective product in the third vector. 10. The method of claim 1 , wherein the second space is partitioned into two-dimensional tiles and the multiplier array generates products for one of the two-dimensional tiles in parallel with additional multiplier arrays that generate additional products for the remaining two-dimensional tiles. 11. The method of claim 10 , wherein each one of the additional multiplier arrays receives an additional vector comprising only non-zero input activation values and additional associated positions within a different tile of the second space. 12. The method of claim 10 , wherein the tile extends for a number of input channels into an additional dimension of the first space and the second space, and further comprising receiving additional vectors comprising only non-zero weight values and additional associated positions of the non-zero weight values for each one of the number of input channels. 13. A convolutional neural network accelerator, comprising: an array of processing elements, each processing element comprising a multiplier array that is configured to: receive a first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a first space; receive a second vector comprising only non-zero input activation values and second associated positions within a second space; multiply the non-zero weight values with the non-zero activation values to produce a third vector of products; combine the first associated positions with the second associated positions to produce a fourth vector of positions, wherein each position in the fourth vector is associated with a respective product in the third vector; and transmit the third vector to an accumulator array, wherein each one of the products in the third vector is transmitted to an adder in the accumulator array that is configured to generate an output activation value at the position associated with the product. 14. The convolutional neural network accelerator of claim 13 , wherein the multiplier array that is further configured to: receive a fifth vector comprising only additional non-zero weight values and fifth associated positions of the additional non-zero weight values within the first space; multiply, within the multiplier array, the additional non-zero weight values with the non-zero activation values to produce a seventh vector of products; produce an eighth vector of positions, wherein each position in the eighth vector is associated with a respective product in the seventh vector of products; and for each matching position in the fourth vector and the eighth vector, sum the respective products in the third vector and the seventh vector by the accumulator array to produce partial sums. 15. The convolutional neural network accelerator of claim 13 , wherein the first associated positions are defined by coordinates in a first dimension and a second dimension and the second associated positions are defined by coordinates in the first dimension and the second dimension. 16. The convolutional neural network accelerator of claim 13 , wherein the first vector is broadcast to each processing element in the array of processing elements. 17. The convolutional neural network accelerator of claim 13 , wherein the second space is partitioned into two-dimensional tiles and the multiplier array generates products for one of the two-dimensional tiles in parallel with additional multiplier arrays that generate additional products for the remaining two-dimensional tiles. 18. The convolutional neural network accelerator of claim 16 , wherein each one of the additional multiplier arrays receives an additional vector comprising only non-zero input activation values and additional associated positions within a different tile of the second space. 19. The convolutional neural network accelerator of claim 16 , wherein the tile extends for a number of input channels into an additional dimension of the first space and the second space and further comprising receiving additional vectors comprising only non-zero weight values and additional associated positions of the non-zero weight values for each one of the number of i

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10860922B2 cover?
A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activat…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 08 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).