What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Flexible precision neural inference processing unit

US11537859B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11537859-B2
Application number	US-201916705565-A
Country	US
Kind code	B2
Filing date	Dec 6, 2019
Priority date	Dec 6, 2019
Publication date	Dec 27, 2022
Grant date	Dec 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Neural inference chips are provided. A neural core of the neural inference chip comprises a vector-matrix multiplier; a vector processor; and an activation unit operatively coupled to the vector processor. The vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural inference chip comprising a neural core, the neural core comprising: a vector-matrix multiplier adapted to receive a weight matrix having a weight matrix precision, receive an input activation vector having an input activation vector precision, and compute a partial sum vector by multiplying the input activation vector by the weight matrix, the partial sum vector having a partial sum vector precision; a vector processor adapted to receive one or more partial sum vector from one or more vector source, the one or more vector source including the vector-matrix multiplier, and perform one or more vector function on the one or more partial sum vector to yield a vector processor output vector, the vector processor output vector having a precision equal to the partial sum vector precision; and an activation unit operatively coupled to the vector processor and adapted to apply an activation function to the vector processor output vector, yielding an output activation vector having an output activation precision, wherein the vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision. 2. The neural inference chip of claim 1 , further comprising: at least one network interconnecting the neural core with at least one additional neural core, the at least one network adapted to deliver synaptic weights and/or input activations to the neural cores at variable precision. 3. The neural inference chip of claim 2 , wherein the at least one network is further adapted to vary the weight matrix precision and dimension, input activation vector precision and dimension, and/or the output activation vector precision and dimension while maintaining constant bandwidth. 4. The neural inference chip of claim 1 , wherein the neural core further comprises: at least one memory, the at least one memory being adapted to store weight matrices, input activation vectors, and/or output activation vectors at variable precision. 5. The neural inference chip of claim 4 , wherein the at least one memory is further adapted to vary the weight matrix precision and dimension, input activation vector precision and dimension, and/or the output activation vector precision and dimension while maintaining constant storage utilization. 6. The neural inference chip of claim 1 , wherein the vector-matrix multiplier is further adapted to vary the weight matrix precision and dimension and/or the input activation vector precision and dimension while maintaining constant bandwidth. 7. The neural inference chip of claim 6 , wherein the vector-matrix multiplier is further adapted to compute a variable number of multiplications per cycle at variable precision, wherein the variable number of multiplications per cycle and variable precision are inversely proportional. 8. The neural inference chip of claim 1 , wherein the activation function is adapted to re-range the vector processor output vector. 9. The neural inference chip of claim 8 , wherein applying the activation function comprises applying a saturating function. 10. The neural inference chip of claim 9 , wherein the saturating function has as least one bound corresponding to the output activation precision. 11. The neural inference chip of claim 8 , wherein applying the activation function comprises truncating one or more least significant bits. 12. The neural inference chip of claim 1 , wherein the variable precision is selected from 2 bit, 4 bit, 8 bit, 16 bit, and 32 bit. 13. The neural inference chip of claim 1 , wherein the variable precision is selectable at runtime. 14. The neural inference chip of claim 1 , wherein the variable precision is selectable for each layer of a neural network. 15. The neural inference chip of claim 1 , wherein the weight matrix precision is equal to the activation vector precision. 16. The neural inference chip of claim 15 , wherein the partial sum vector precision is not equal to the output activation precision. 17. The neural inference chip of claim 1 , wherein the partial sum vector precision is higher than the weight matrix precision and/or the activation vector precision. 18. The neural inference chip of claim 15 , wherein the output activation precision is equal to the weight matrix precision. 19. A method comprising: receiving a weight matrix having a first precision; receiving an activation vector having the first precision; computing a vector-matrix multiplication of the weight matrix and the activation vector, yielding a partial sum vector a second precision; performing one or more vector functions on the partial sum vector to yield a vector processor output vector having the second precision; and applying an activation function to the vector processor output vector, yielding an output activation vector having a third precision, wherein at least one of the first, second, and third precision is varied at runtime. 20. The method of claim 19 , further comprising: varying at least one of the first, second, and third precision for computation of each layer of a neural network.

Assignees

Inventors

Classifications

G06N3/063Primary
using electronic means · CPC title
G06F17/16
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
G06N3/048
Activation functions · CPC title
G06F15/8053
Vector processors · CPC title
G06N3/0481
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 72709370

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11537859B2 cover?: Neural inference chips are provided. A neural core of the neural inference chip comprises a vector-matrix multiplier; a vector processor; and an activation unit operatively coupled to the vector processor. The vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).