Partial sum management and reconfigurable systolic flow architectures for in-memory computation

US12340304B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12340304-B2
Application numberUS-202117398791-A
CountryUS
Kind codeB2
Filing dateAug 10, 2021
Priority dateAug 10, 2021
Publication dateJun 24, 2025
Grant dateJun 24, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus for performing machine learning tasks, and in particular, to a neural-network-processing architecture and circuits for improved handling of partial accumulation results in weight-stationary operations, such as operations occurring in compute-in-memory (CIM) processing elements (PEs). One example PE circuit for machine learning generally includes an accumulator circuit, a flip-flop array having an input coupled to an output of the accumulator circuit, a write register, and a first multiplexer having a first input coupled to an output of the write register, having a second input coupled to an output of the flip-flop array, and having an output coupled to a first input of the first accumulator circuit.

First claim

Opening claim text (preview).

What is claimed is: 1. A processing element (PE) circuit comprising: a first accumulator circuit; a flip-flop array having an input coupled to an output of the first accumulator circuit; a write register; a first multiplexer having a first input coupled to an output of the write register, having a second input coupled to an output of the flip-flop array, and having an output coupled to a first input of the first accumulator circuit; an adder circuit; and an accumulator-and-shifter circuit having an input coupled to an output of the adder circuit and having an output coupled to a second input of the first accumulator circuit. 2. The PE circuit of claim 1 , further comprising a read register having an input coupled to the output of the flip-flop array. 3. The PE circuit of claim 2 , further comprising a write bus coupled to an output of the read register. 4. The PE circuit of claim 3 , further comprising a read bus coupled to an input of the write register. 5. A neural network circuit comprising a plurality of PE circuits, wherein at least one of the plurality of PE circuits comprises the PE circuit of claim 4 , the neural network circuit further comprising: a memory coupled to the write bus and to the read bus; and a global memory coupled to the read bus, wherein another one of the plurality of PE circuits has an output coupled to a second input of the first accumulator circuit. 6. The neural network circuit of claim 5 , wherein the other one of the plurality of PE circuits does not include a write register. 7. The PE circuit of claim 1 , further comprising a read bus coupled to an input of the write register, wherein the read bus is configured to couple to at least one of a tightly coupled memory or a global memory, external to the PE circuit. 8. The PE circuit of claim 1 , further comprising: a second accumulator circuit; and a second multiplexer having a first input coupled to an output of the second accumulator circuit and having an output coupled to the first input of the first accumulator circuit. 9. The PE circuit of claim 1 , wherein the PE circuit is a digital compute-in-memory (DCIM) PE circuit and wherein the PE circuit further comprises: a DCIM array; a bit-column adder tree circuit coupled to the DCIM array; and a weight-shift adder tree circuit coupled to the bit-column adder tree circuit. 10. The PE circuit of claim 9 , wherein the DCIM array comprises a plurality of compute-in-memory cells and wherein at least one of the compute-in-memory cells comprises an eight-transistor (8T) static random-access memory (SRAM) cell. 11. A method of neural network processing, comprising: receiving, at a first input of a multiplexer, first data from a write register; receiving, at a second input of the multiplexer, second data from a flip-flop array; receiving, at an accumulator circuit, third data from a processing element (PE) circuit; selecting, with the multiplexer, data to output to the accumulator circuit between the first data and the second data; and accumulating, with the accumulator circuit, the selected output data from the multiplexer and the third data received from the PE circuit to generate accumulated data, wherein the PE circuit comprises: an adder circuit; and an accumulator-and-shifter circuit having an input coupled to an output of the adder circuit and having an output coupled to an input of the accumulator circuit. 12. The method of claim 11 , further comprising: outputting the accumulated data to the flip-flop array; shifting, with the flip-flop array, the accumulated data to a read register; and writing the accumulated data from the read register to a memory via a write bus. 13. The method of claim 11 , further comprising: outputting the accumulated data to the flip-flop array; shifting, with the flip-flop array, the accumulated data to a read register; processing the accumulated data from the read register with digital post-processing logic; and writing the processed, accumulated data to a memory via a write bus coupled between the digital post-processing logic and the memory.

Assignees

Inventors

Classifications

  • using electronic means · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Multiplying only · CPC title

  • Adding; Subtracting (G06F7/483 - G06F7/491, G06F7/544 - G06F7/556 take precedence) · CPC title

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12340304B2 cover?
Methods and apparatus for performing machine learning tasks, and in particular, to a neural-network-processing architecture and circuits for improved handling of partial accumulation results in weight-stationary operations, such as operations occurring in compute-in-memory (CIM) processing elements (PEs). One example PE circuit for machine learning generally includes an accumulator circuit, a f…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).