Compressed weight distribution in networks of neural processors

US12443830B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12443830-B2
Application numberUS-202016733393-A
CountryUS
Kind codeB2
Filing dateJan 3, 2020
Priority dateJan 3, 2020
Publication dateOct 14, 2025
Grant dateOct 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural inference chip includes a global weight memory; a neural core; and a network connecting the global weight memory to the at least one neural core. The neural core comprises a local weight memory. The local weight memory comprises a plurality of memory banks. Each of the plurality of memory banks is uniquely addressable by at least one index. The neural inference chip is adapted to store in the global weight memory a compressed weight block comprising at least one compressed weight matrix. The neural inference chip is adapted to transmit the compressed weight block from the global weight memory to the core via the network. The core is adapted to decode the at least one compressed weight matrix into a decoded weight matrix and store the decoded weight matrix in its local weight memory. The at core is adapted to apply the decoded weight matrix to a plurality of input activations to produce a plurality of output activations.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural inference chip comprising: a global weight memory; at least one neural core, the at least one neural core comprising a local weight memory, the local weight memory comprising a plurality of memory banks, each of the plurality of memory banks being uniquely addressable by at least one index, wherein the at least one index identifies a column of a compressed weight matrix and a first memory bank of the plurality of memory banks, each of the plurality of memory banks comprising a comparator, a value mux, and an index mux such that the first memory bank comprises a first comparator, a first value mux, and a first index mux, wherein the comparator of each memory bank is adapted to compare the at least one index to the index of that memory bank of the plurality of memory banks, that comparator provides a control line to the value mux of that memory bank, the value mux is configured to select between zero and a weight value based on the control line, the index mux is configured to select between the weight value and an index value based on the at least one index, and the index value is an index of the weight value in an uncompressed weight matrix; a network-on-chip connecting the global weight memory to the at least one neural core, wherein the neural inference chip is adapted to store in the global weight memory a compressed weight block comprising at least one compressed weight matrix, the neural inference chip is adapted to transmit the compressed weight block from the global weight memory to the at least one neural core via the network-on-chip, the at least one core is adapted to decode the at least one compressed weight matrix into a decoded weight matrix and store the decoded weight matrix in its local weight memory, the at least one neural core is adapted to apply the decoded weight matrix to a plurality of input activations to produce a plurality of output activations. 2. The neural inference chip of claim 1 , wherein: the at least one compressed weight matrix comprises a plurality of column indices and associated weight values, the plurality of column indices corresponding to each position within the decoded weight matrix containing a non-zero value, the associated weight values comprising the weight value. 3. The neural inference chip of claim 1 , wherein: each of the plurality of memory banks is adapted to selectively store elements of the decoded weight matrix according to its at least one index. 4. The neural inference chip of claim 2 , wherein each memory bank is adapted to selectively store elements of the decoded weight matrix by comparing the plurality of column indices to the at least one index. 5. The neural inference chip of claim 1 , wherein: the at least one compressed weight matrix comprises a plurality of rows such that the at least one compressed weight matrix comprises a first row, each of the plurality of rows comprising a column index and associated value for each position within that row of the decoded weight matrix containing a non-zero value such that the first row comprises the index value and the weight value. 6. The neural inference chip of claim 1 , wherein the decoded weight matrix is sparse. 7. The neural inference chip of claim 1 , wherein: the at least one compressed weight matrix contains fewer zero values than the decoded weight matrix; and the decoded weight matrix comprises at least one zero value. 8. The neural inference chip of claim 7 , wherein decoding the at least one compressed weight matrix comprises inserting each value of the at least one compressed weight matrix into a zero-filled matrix. 9. The neural inference chip of claim 1 , wherein: the compressed weight block comprises a plurality of compressed weight matrices; the at least one core is adapted to decode the compressed weight block into a plurality of decoded weight matrices and store the plurality of decoded weight matrices in its local weight memory; and the at least one neural core is adapted to apply the plurality of decoded weight matrices to a plurality of input activations to produce a plurality of output activations. 10. The neural inference chip of claim 1 , wherein: the compressed weight block comprises a matrix index associated with each of the plurality of compressed weight matrices; and each of the compressed weight matrices comprises a plurality of column indices and associated values such that the plurality of column indices comprises the index value and the associated values comprises the weight value, the plurality of column indices corresponding to each position within the associated decoded weight matrix containing a non-zero value. 11. The neural inference chip of claim 10 , wherein: each of the plurality of memory banks is adapted to selectively store elements of the decoded weight matrix according to its associated matrix index and column index. 12. The neural inference chip of claim 1 , wherein: the neural inference chip is adapted to store in the global weight memory an uncompressed weight matrix; the neural inference chip is adapted to transmit the uncompressed weight matrix from the global weight memory to the at least one neural core via the network-on-chip; the at least one core is adapted to store the uncompressed weight matrix in its memory; and the at least one neural core is adapted to apply the uncompressed weight matrix to a plurality of input activations to produce a plurality of output activations. 13. The neural inference chip of claim 12 , wherein: the neural inference chip is operable to switch between a compressed and an uncompressed mode at runtime, when in compressed mode, the compressed weight block being transmitted, and when in uncompressed mode the uncompressed weight matrix being transmitted. 14. The neural inference chip of claim 1 , the network-on-chip interconnecting each of the plurality of memory banks having a common row index. 15. The neural inference chip of claim 1 , wherein the global weight memory is external to the at least one neural core. 16. The neural inference ship of claim 1 , wherein the global weight memory is distributed among the at least one neural core. 17. A method comprising: storing a compressed weight block comprising at least one weight matrix in a global weight memory of a neural inference chip; transmitting the compressed weight block from the global weight memory to at least one neural core on the neural inference chip via a network-on-chip, the at least one neural core comprising a local weight memory, the local weight memory comprising a plurality of memory banks, each of the plurality of memory banks being uniquely addressable by at least one index, wherein the at least one index identifies a column of a compressed weight matrix and a first memory bank of the plurality of memory banks, each of the plurality of memory banks comprising a comparator, a value mux, and an index mux such that the first memory bank comprises a first comparator, a first value mux and a first index mux, wherein the comparator of each memory bank is adapted to compare the at least one index to an index of that memory bank of the plurality of memory banks, that comparator provides a control line to the value mux of that memory bank, the value mux is configured to select between zero and a weight value based on the control line, the index mux is configured to select between the weight value and an index value based on the at least one index, and the index value is an index of the weight value in an uncompressed weight matrix, th

Assignees

Inventors

Classifications

  • Inference or reasoning models · CPC title

  • Activation functions · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12443830B2 cover?
A neural inference chip includes a global weight memory; a neural core; and a network connecting the global weight memory to the at least one neural core. The neural core comprises a local weight memory. The local weight memory comprises a plurality of memory banks. Each of the plurality of memory banks is uniquely addressable by at least one index. The neural inference chip is adapted to store…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).