Flexible accelerator for sparse tensors in convolutional neural networks

US11462003B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11462003-B2
Application numberUS-202016830167-A
CountryUS
Kind codeB2
Filing dateMar 25, 2020
Priority dateMar 25, 2020
Publication dateOct 4, 2022
Grant dateOct 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system with a multiplication circuit having a plurality of multipliers is disclosed. Each of the plurality of multipliers is configured to receive a data value and a weight value to generate a product value in a convolution operation of a machine learning application. The system also includes an accumulator configured to receive the product value from each of the plurality of multipliers and a register bank configured to store an output of the convolution operation. The accumulator is further configured to receive a portion of values stored in the register bank and combine the received portion of values with the product values to generate combined values. The register bank is further configured to replace the portion of values with the combined values.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a multiplication circuit comprising a plurality of multipliers, each of the plurality of multipliers configured to receive a data value and a weight value to generate a product value in a convolution operation of a machine learning application, the data value is part of one of a plurality of sub-feature maps that are generated from an input feature map; an accumulator configured to receive the product value from each of the plurality of multipliers; and a register bank configured to store an output of the convolution operation, wherein the accumulator is further configured to receive a portion of values stored in the register bank and combine the received portion of values with the product values to generate combined values; and wherein the register bank is further configured to replace the portion of values with the combined values. 2. The system of claim 1 , wherein the register bank comprises a plurality of row registers configured to shift in a row direction and a plurality of column registers configured to shift in a column direction. 3. The system of claim 1 , wherein combining the received portion of values with the product values comprises adding each of the received portion of values with a corresponding one of the product values. 4. The system of claim 1 , further comprising a reconfigurable tree adder configured to receive the product values and combine groups of the product values. 5. The system of claim 1 , wherein the register bank is configured to shift a subset of values from a previous iteration by one position and send the shifted subset of values to the accumulator. 6. A system comprising: a multiplication circuit comprising a plurality of multipliers, each of the plurality of multipliers configured to receive a data value and a weight value to generate a product value in a convolution operation of a machine learning application; an accumulator configured to receive the product value from each of the plurality of multipliers; a register bank configured to store an output of the convolution operation, wherein the accumulator is further configured to receive a portion of values stored in the register bank and combine the received portion of values with the product values to generate combined values; and wherein the register bank is further configured to replace the portion of values with the combined values; and a first multi-stage interconnection network configured to receive the combined values from the accumulator. 7. The system of claim 6 , wherein the first multi-stage interconnection network is configured to sort the combined values and write the sorted combined values into a vector accumulator register. 8. The system of claim 7 , further comprising a second multi-stage interconnection network configured to read a subset of values from the vector accumulator register and send the subset of values to the accumulator. 9. The system of claim 7 , wherein the vector accumulator register is further configured to receive the portion of values from the register bank before sending the portion of values to the accumulator. 10. The system of claim 6 , further comprising a third multi-stage interconnection network configured to receive the product values from the plurality of multipliers, and send at least some of the product values to the accumulator based on an index value of each of the product values. 11. A method comprising: inputting, by a processor in a machine learning application, a data value and a weight value into each of a plurality of multipliers to generate a plurality of product values in each iteration of a plurality of iterations of a convolution operation; combining, by the processor in each iteration of the plurality of iterations, each of the plurality of product values with one of a plurality of accumulator values in an accumulator to generate a plurality of combined values, wherein the plurality of accumulator values of a current iteration are received from a register bank and are obtained by shifting a subset of values in the register bank after a previous iteration by one position; and replacing, by the processor in each iteration of the plurality of iterations, the plurality of accumulator values with the plurality of combined values in the register bank. 12. The method of claim 11 , wherein values in the register bank after a last iteration of the plurality of iterations provide an output of the convolution operation on an input sub-feature map generated from an input feature map. 13. The method of claim 11 , wherein each of the plurality of multipliers receive a same weight value. 14. The method of claim 11 , wherein at least one of the plurality of multipliers receive the weight value that is different from the weight value received by a remaining one of the plurality of multipliers. 15. The method of claim 11 , further comprising receiving the combined values from the accumulator in a first multi-stage interconnection network. 16. The method of claim 11 , further comprising shifting, by the processor, values in the register bank after a last iteration of the plurality of iterations to obtain an output sub-feature map. 17. A non-transitory computer-readable media comprising computer-readable instructions stored thereon that when executed by a processor associated with a machine learning application cause the processor to: partition an input feature map into a plurality of sub-feature maps; input each of the plurality of sub-feature maps into a tensor compute unit of a plurality of tensor compute units to generate an output sub-feature map, wherein generating the output sub-feature map for a first sub-feature map of the plurality of sub-feature maps comprises: inputting a plurality of data values of the first sub-feature map into a plurality of multipliers of a first tensor compute unit of the plurality of tensor compute units; inputting a weight value into the plurality of multipliers for generating a plurality of product values; combining each of the plurality of product values with one of a previously computed product value to obtain a plurality of combined values; and shifting the plurality of combined values to obtain the output sub-feature map for the first sub-feature map; and combine the output sub-feature map from each of the plurality of tensor compute units to obtain an output feature map. 18. The non-transitory computer-readable media of claim 17 , further comprising performing a non-linear Rectified Linear Unit operation and a pooling operation on the shifted plurality of combined values to obtain the output sub-feature map. 19. The non-transitory computer-readable media of claim 17 , further comprising compressing the output sub-feature map before combining to obtain the output feature map. 20. The non-transitory computer-readable media of claim 18 , wherein each of the plurality of data values that are input into the plurality of multipliers is a non-zero value, and wherein the weight value is a non-zero value.

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • using specific electronic processors · CPC title

  • using neural networks · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11462003B2 cover?
A system with a multiplication circuit having a plurality of multipliers is disclosed. Each of the plurality of multipliers is configured to receive a data value and a weight value to generate a product value in a convolution operation of a machine learning application. The system also includes an accumulator configured to receive the product value from each of the plurality of multipliers and …
Who is the assignee on this patent?
Western Digital Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).