Massively parallel neural inference computing elements

US11010662B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11010662-B2
Application numberUS-202016808900-A
CountryUS
Kind codeB2
Filing dateMar 4, 2020
Priority dateMar 30, 2018
Publication dateMay 18, 2021
Grant dateMay 18, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output. A plurality of adders is operatively coupled to one of the groups of multipliers. Each of the plurality of adders is adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum. A plurality of function blocks is operatively coupled to one of the plurality of adders. Each of the plurality of function blocks is adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a plurality of multipliers, the plurality of multipliers arranged in a plurality of equal-sized groups, each of the plurality of multipliers being adapted to, in parallel, apply a weight to an input activation to generate an output; a plurality of adders, each of the plurality of adders being operatively coupled to one of the groups of multipliers, each of the plurality of adders being adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum; and a first plurality of function blocks, each of the first plurality of function blocks being operatively coupled to one of the plurality of adders, each of the first plurality of function blocks being adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value, wherein the first plurality of function blocks is adapted to combine the output values with subsequently computed output values of the first plurality of function blocks. 2. The system of claim 1 , adapted to receive a matrix of weights and a vector of activations. 3. The system of claim 1 , wherein each of the plurality of adders comprises a tree of adders. 4. The system of claim 3 , wherein the tree of adders is a binary tree. 5. The system of claim 3 , wherein the tree of adders comprises a plurality of carry-save adders. 6. The system of claim 2 , wherein each activation of the vector of activations is broadcast to all of the groups of multipliers. 7. The system of claim 2 , further comprising a systolic pipeline operatively coupled to each of the groups of multipliers. 8. The system of claim 1 , wherein the groups of multipliers are pipelined. 9. The system of claim 1 , wherein the weights are balanced ternary values. 10. The system of claim 1 , wherein each of the plurality of multipliers comprises a multiplexor. 11. The system of claim 2 , wherein the matrix of weights is compressed, and wherein the system is adapted to decompress the compressed matrix of weights. 12. The system of claim 1 , further comprising: a plurality of shifters, each shifter operatively connected to one of the first plurality of function blocks, each shifter adapted to, in parallel, shift the output value of its corresponding function block, and wherein combining the output values with subsequently computed output values comprises combining the shifted values with the subsequently computed output values. 13. The system of claim 1 , wherein the function of each of the first plurality of function blocks is an activation function. 14. The system of claim 1 , wherein the function of each of the first plurality of function blocks is programmable. 15. The system of claim 1 , wherein the function of each of the first plurality of function blocks is addition. 16. The system of claim 1 , wherein the function of each of the first plurality of function blocks is multiplication. 17. The system of claim 1 , wherein the function of each of the first plurality of function blocks is an identity function. 18. The system of claim 1 , further comprising a lookup table, the function of each of the first plurality of function blocks comprising a lookup from the lookup table. 19. The system of claim 18 , wherein the lookup table is programmable. 20. The system of claim 1 , wherein the function of each of the first plurality of function blocks is a max function. 21. The system of claim 1 , wherein the function of each of the first plurality of function blocks is a min function. 22. A method comprising: applying by a plurality of equal-sized groups of multipliers, in parallel, a plurality of weights to a plurality of input activations to generate a plurality of outputs for each group of multipliers; adding by a plurality of adders, in parallel, the plurality of outputs from each group of multipliers to generate a partial sum from each group of multipliers; and applying by a first plurality of function blocks, each of the first plurality of function blocks being operatively coupled to one of the plurality of adders, in parallel, a function to the partial sum of its associated adder to generate an output value, wherein the first plurality of function blocks is adapted to combine the output values with subsequently computed output values of the first plurality of function blocks. 23. A system comprising: a plurality of multipliers, the plurality of multipliers arranged in a plurality of equal-sized groups; a plurality of adders, each of the plurality of adders being operatively coupled to one of the groups of multipliers; a first plurality of function blocks, each of the first plurality of function blocks being operatively coupled to one of the plurality of adders; a computer readable storage medium having program instructions embodied therewith, the program instructions executable to perform a method comprising: by each of the plurality of multipliers, in parallel, applying a weight to an input activation to generate an output; by each of the plurality of adders, in parallel, adding the outputs of the multipliers within its associated group to generate a partial sum; and by each of the first plurality of function blocks, in parallel, applying a function to the partial sum of its associated adder to generate an output value, wherein the first plurality of function blocks is adapted to combine the output values with subsequently computed output values of the first plurality of function blocks.

Assignees

Inventors

Classifications

  • Activation functions · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11010662B2 cover?
Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output. A plurality of adders is operatively coupled to one of the groups of multipliers. Each of the plurality of adders is adapted …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 18 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).