Leveraging processing-in-memory (pim) resources to expedite non-pim instructions executed on a host
US-2023205693-A1 · Jun 29, 2023 · US
US12493577B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12493577-B2 |
| Application number | US-202318512788-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 17, 2023 |
| Priority date | Jun 14, 2023 |
| Publication date | Dec 9, 2025 |
| Grant date | Dec 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided are a digital signal processor (DSP) and an electronic device using the same. The DSP includes: a first function unit (FU) having a non-IMC (in-memory computing) operation architecture using an operation unit; a second FU having an IMC architecture using a memory cell array; and a register file used by the first FU and the second FU.
Opening claim text (preview).
What is claimed is: 1 . A processor comprising: a first function unit (FU) having a non-IMC (in-memory computing) operation architecture using an operation unit; a second FU having an IMC architecture using a memory cell array; and a register file is directly connected to the first FU and used by the first FU and the second FU, where the first FU directly accesses the register file. 2 . The processor of claim 1 , wherein the first FU is configured to perform non-IMC operation using a logic gate of the operation unit, and the second FU is configured to perform an IMC operation using a bit cell in the memory cell array. 3 . The processor of claim 1 , wherein the processor is a digital signal processor (DSP) and wherein the first FU comprises a scalar FU configured to perform a scalar operation or a vector FU configured to perform a vector operation. 4 . The processor of claim 1 , wherein the first FU and the second FU belong to different lanes of a set of lanes that are configured to process instructions independently of each other and in parallel. 5 . The processor of claim 1 , further comprising: a very long instruction word (VLIW) packetizer configured to generate a VLIW packet by packetizing independent instructions into the VLIW packet, wherein the first FU is configured to process a first instruction of the VLIW packet, and the second FU is configured to process a second instruction of the VLIW packet. 6 . The processor of claim 1 , further comprising: a buffer block disposed between the second FU and the register file and configured to perform data transmission between the second FU and the register file. 7 . The processor of claim 6 , wherein the buffer block comprises: an input first-in-first-out (FIFO) buffer configured to transmit input data stored in the register file to the memory cell array of the second FU; and an output FIFO buffer configured to transmit, to the register file, output data generated through the memory cell array of the second FU. 8 . The processor of claim 6 , wherein the register file comprises: a first register file used by the first FU; and a second register file used by the second FU, wherein when input data is loaded into the second register file, the input data is stored into the second FU via the buffer block. 9 . The processor of claim 6 , wherein the first FU is configured to load input data into the register file according to a load instruction, when the input data is loaded into the register file, the input data is transmitted to the buffer block, and the second FU is configured to store the input data stored in the buffer block into the memory cell array according to a pop instruction. 10 . The processor of claim 6 , wherein the first FU is configured to load input data into the register file according to a load instruction, when the input data is loaded into the register file, the input data is transmitted to the buffer block, and the second FU is configured to, in a buffer mode, store the input data stored in the buffer block into the memory cell array without an explicit instruction. 11 . The processor of claim 6 , wherein the second FU is configured to, when neural network input data is input into the memory cell array of the second FU in a state in which neural network weight data is stored in the memory cell array of the second FU, generate neural network output data by performing a multiply-accumulate (MAC) operation between the neural network weight data and the neural network input data. 12 . The processor of claim 11 , wherein the second FU is configured to store the network output data stored in the memory cell array into the register file through the buffer block according to a push instruction. 13 . The processor of claim 1 , wherein one of the first FU and the second FU is configured to perform a first operation and store a first operation result into the register file, and the other one of the first FU and the second FU is configured to perform a second operation based on the first operation result. 14 . A digital signal processor (DSP) comprising: a very long instruction word (VLIW) packetizer configured to generate a VLIW packet by packetizing a plurality of independent instructions; and a first lane configured to process a first instruction of the VLIW packet based on a non-IMC architecture using an operation unit; and a second lane configured to process a second instruction of the VLIW packet based on an IMC operation architecture using a memory cell array, wherein the first instruction and the second instruction are processed in the first lane and the second lane in parallel. 15 . The DSP of claim 14 , wherein a first function unit (FU) of the first lane is configured to perform a non-IMC operation using a logic gate of the operation unit, and a second FU of the second lane is configured to perform an IMC operation using a bit cell in the memory cell array. 16 . The DSP of claim 14 , wherein the first lane comprises a first FU having the non-IMC operation architecture and a first register file used by the first FU, the second lane comprises a second FU having the memory cell array and a second register file used by the second FU, wherein the DSP further comprises a buffer block disposed between the second FU and the second register file and configured to perform data transmission between the second FU and the second register file. 17 . The DSP of claim 16 , wherein one of the first FU and the second FU is configured to perform a first operation to process the first instruction and store a corresponding first operation result in one of the first register file and the second register file, and the other one of the first FU and the second FU is configured to perform a second operation to process the second instruction based on the first operation result. 18 . A method comprising: fetching, by a processor, an instruction comprising a bundle of instructions including a first instruction, a second instruction, and a third instruction; executing the fetched first instruction by a first non-IMC FU of a first lane of the processor; executing the fetched second instruction by a second non-IMC FU of a second lane of the processor; and executing the fetched third instruction by an IMC FU of a first lane of the processor, wherein the first lane and the second lane perform operations independently and in parallel. 19 . The method of claim 18 , wherein the processor comprises a single chip, wherein the first non-IMC FU comprises a vector FU, the second non-IMC FU comprises a scalar FU, and the IMC FU is configured to perform a multiply-and-accumulate on data retained in memory of the IMC FU. 20 . The method of claim 18 , further comprising placing the processor in a buffer mode and based thereon configuring the processor to cause the IMC FU to function as a buffer for the first non-IMC FU and the second non-IMC FU.
Instruction prefetching · CPC title
using electronic means · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Implementation provisions of register files, e.g. ports · CPC title
Vector processors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.