Multi-mode low-precision inner-product computation circuits for massively parallel neural inference engine

US11270196B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11270196-B2
Application numberUS-201916653366-A
CountryUS
Kind codeB2
Filing dateOct 15, 2019
Priority dateOct 15, 2019
Publication dateMar 8, 2022
Grant dateMar 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Neural inference chips for computing neural activations are provided. In various embodiments, the neural inference chip is adapted to: receive an input activation tensor comprising a plurality of input activations; receive a weight tensor comprising a plurality of weights; Booth recode each of the plurality of weights into a plurality of Booth-coded weights, each Booth coded value having an order; multiply the input activation tensor by the Booth coded weights, yielding a plurality of results for each input activation, each of the plurality of results corresponding to the orders of the Booth-coded weights; for each order of the Booth-coded weights, sum the corresponding results, yielding a plurality of partial sums, one for each order; and compute a neural activation from a sum of the plurality of partial sums.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for computing neural activations, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a neural inference chip to cause the neural inference chip to perform a method comprising: receiving an input activation tensor comprising a plurality of input activations, the input activation tensor representing an image, each of the plurality of input activations corresponding to a value at a location in the image; receiving a weight tensor comprising a plurality of weights; Booth recoding each of the plurality of weights into a plurality of Booth-coded weights, each Booth coded value having an order; multiplying the input activation tensor by the Booth coded weights, yielding a plurality of results for each input activation, each of the plurality of results corresponding to the orders of the Booth-coded weights; for each order of the Booth-coded weights, summing the corresponding results, yielding a plurality of partial sums, one for each order; and computing a neural activation from a sum of the plurality of partial sums. 2. The computer program product of claim 1 , wherein the input activation tensor has a dimension of one. 3. The computer program product of claim 1 , wherein the weight tensor has a dimension of two. 4. The computer program product of claim 1 , wherein computing the neural activation comprises shifting each of the plurality of partial sums according to its corresponding order. 5. The computer program product of claim 1 , wherein computing the neural activation comprises shifting each of the plurality of partial sums according to a precision of the input activations. 6. The computer program product of claim 1 , wherein computing the neural activation comprises applying a nonlinear activation function to the sum of the plurality of partial sums. 7. The computer program product of claim 1 , wherein summing said corresponding results comprises applying a plurality of carry-save adders. 8. A computer program product for computing neural activations, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a neural inference chip to cause the neural inference chip to perform a method comprising: receiving an input activation tensor comprising a plurality of input activations, the input activation tensor representing an image, each of the plurality of input activations corresponding to a value at a location in the image; receiving a weight tensor comprising a plurality of weights; Booth recoding each of the plurality of input activations into a plurality of Booth-coded input activations, each Booth coded value having an order; multiplying the weight tensor by the Booth coded input activations, yielding a plurality of results for each weight, each of the plurality of results corresponding to the orders of the Booth-coded input activations; for each order of the Booth-coded input activations, summing the corresponding results, yielding a plurality of partial sums, one for each order; and computing a neural activation from a sum of the plurality of partial sums. 9. The computer program product of claim 8 , wherein the input activation tensor has a dimension of one. 10. The computer program product of claim 8 , wherein the weight tensor has a dimension of two. 11. The computer program product of claim 8 , wherein computing the neural activation comprises shifting each of the plurality of partial sums according to its corresponding order. 12. The computer program product of claim 8 , wherein computing the neural activation comprises shifting each of the plurality of partial sums according to a precision of the input activations. 13. The computer program product of claim 8 , wherein computing the neural activation comprises applying a nonlinear activation function to the sum of the plurality of partial sums. 14. The computer program product of claim 8 , wherein summing said corresponding results comprises applying a plurality of carry-save adders. 15. A neural inference chip for computing neural activations, the neural inference chip adapted to: receive an input activation tensor comprising a plurality of input activations, the input activation tensor representing an image, each of the plurality of input activations corresponding to a value at a location in the image; receive a weight tensor comprising a plurality of weights; Booth recode each of the plurality of weights into a plurality of Booth-coded weights, each Booth coded value having an order; multiply the input activation tensor by the Booth coded weights, yielding a plurality of results for each input activation, each of the plurality of results corresponding to the orders of the Booth-coded weights; for each order of the Booth-coded weights, sum the corresponding results, yielding a plurality of partial sums, one for each order; compute a neural activation from a sum of the plurality of partial sums. 16. The neural inference chip of claim 15 , wherein computing the neural activation comprises shifting each of the plurality of partial sums according to its corresponding order. 17. The neural inference chip of claim 15 , wherein computing the neural activation comprises shifting each of the plurality of partial sums according to a precision of the input activations. 18. The neural inference chip of claim 15 , wherein computing the neural activation comprises applying a nonlinear activation function to the sum of the plurality of partial sums. 19. The neural inference chip of claim 15 , wherein summing said corresponding results comprises applying a plurality of carry-save adders. 20. A neural inference chip for computing neural activations, the neural inference chip adapted to: receive an input activation tensor comprising a plurality of input activations, the input activation tensor representing an image, each of the plurality of input activations corresponding to a value at a location in the image; receive a weight tensor comprising a plurality of weights; Booth recode each of the plurality of input activations into a plurality of Booth-coded input activations, each Booth coded value having an order; multiply the weight tensor by the Booth coded input activations, yielding a plurality of results for each weight, each of the plurality of results corresponding to the orders of the Booth-coded input activations; for each order of the Booth-coded input activations, sum the corresponding results, yielding a plurality of partial sums, one for each order; compute a neural activation from a sum of the plurality of partial sums.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11270196B2 cover?
Neural inference chips for computing neural activations are provided. In various embodiments, the neural inference chip is adapted to: receive an input activation tensor comprising a plurality of input activations; receive a weight tensor comprising a plurality of weights; Booth recode each of the plurality of weights into a plurality of Booth-coded weights, each Booth coded value having an ord…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).