Zero latency prefetching in caches
US-2019114263-A1 · Apr 18, 2019 · US
US11023391B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11023391-B2 |
| Application number | US-201916506151-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 9, 2019 |
| Priority date | Aug 10, 2018 |
| Publication date | Jun 1, 2021 |
| Grant date | Jun 1, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are an apparatus for data processing, an artificial intelligence chip, and an electronic device. The apparatus for data processing includes: at least one input memory, at least one data conveying component, at least one multiplexed arbitration component, and at least one output memory. The input memory is connected to the data conveying component, the data conveying component is connected to the multiplexed arbitration component, and the multiplexed arbitration component is connected to the output memory.
Opening claim text (preview).
The invention claimed is: 1. An apparatus for data processing, comprising: at least one input memory, the input memory configured to store to-be-processed data; at least one data conveying component, the data conveying component configured to read an external processing instruction, parse the processing instruction to acquire a read data address, a write data address, and an operating command, read the to-be-processed data from the at least one input memory based on the read data address, process the to-be-processed data based on the operating command to obtain output data and a corresponding write data address after multiplexed processing, and send a write data request, wherein the operating command comprises a data transposition command and a data conveying command, and the write data address corresponding to the output data is obtained by using the write data address obtained from parsing the processing instruction as an initial write address, and migrating the initial write address backward sequentially based on output time sequence of the output data to obtain the write data address corresponding to the output data; at least one multiplexed arbitration component, the multiplexed arbitration component configured to receive, in response to receiving the write data request of the at least one data conveying component, the output data and the corresponding write data address of the at least one data conveying component, select output data and a corresponding write data address of one of the at least one data conveying component from the received output data and write data address, output the selected output data and corresponding write data address, and send a write enable signal; and at least one output memory, configured to receive, in response to receiving the write enable signal sent by the multiplexed arbitration component, the output data and the corresponding write data address from the multiplexed arbitration component, and write the received output data into the corresponding write data address. 2. The apparatus according to claim 1 , wherein the data conveying component comprises: a front-end decoding component, configured to parse the read processing instruction, and execute following parsing: extracting the read data address, the write data address, and the operating command from the processing instruction, sending a read data request to the at least one input memory, caching the to-be-processed data sent by the at least one input memory in response to receiving the read data request into a data queue, and caching the extracted operating command into a command queue; and at least one processing component, each of the at least one processing component configured to process the to-be-processed data in the data queue based on the operating command in the command queue, to obtain a piece of output data. 3. The apparatus according to claim 2 , wherein the parsing executed by the front-end decoding component further comprises: determining whether the operating command is the data conveying command or the data transposition command, broadcasting, by the front-end decoding component, the to-be-processed data sent by the at least one input memory to the each of the at least one processing components if the operating command is the data conveying command; or sending, by the front-end decoding component, the to-be-processed data sent by the at least one input memory to corresponding at least one processing component if the operating command is the data transposition command, wherein each of the processing components is preconfigured with a corresponding read data address offset. 4. The apparatus according to claim 3 , wherein the front-end decoding component determines whether the read processing instruction is a single-step execution instruction or a batch instruction after parsing the processing instruction; executes the parsing if the processing instruction is the single-step instruction; or repeatedly executes the parsing a preset number of times if the processing instruction is the batch instruction, and adjusts the read data address and the write data address based on a preset address offset step length after executing the parsing each time. 5. The apparatus according to claim 2 , wherein the processing component comprises: a data register, configured to read the to-be-processed data from the data queue; a command register, configured to read the operating command from the command queue; a state machine, configured to perform state control based on a command of the command register; and a multiplexer, configured to select to-be-processed data from the data register based on control of the state machine, and output the selected to-be-processed data. 6. The apparatus according to claim 5 , wherein the state machine is further configured to receive the write data address obtained by parsing the processing instruction from the command register, calculate the write data address of the output data based on the received write data address and the write address offset preconfigured in the at least one processing component, and send the write data request and the write data address of the output data to the at least one multiplexed arbitration component. 7. The apparatus according to claim 5 , wherein the multiplexed arbitration component comprises at least one arbitrating unit, each of the at least one arbitrating unit comprises an arbiter and a selector, and the arbiter is configured to arbitrate the output data of one of the at least one processing component in the data conveying components, control the selector to select output data of one of the at least one processing component and a corresponding write data address based on a arbitrating result, output the selected output data and corresponding write data address, and send the write enable signal to the at least one output memory. 8. The apparatus according to claim 7 , wherein the output memory is configured to receive the write enable signal, the output data, and the corresponding write data address outputted by the multiplexed arbitration component, and write the output data into the corresponding write data address under the control of the write enable signal. 9. The apparatus according to claim 1 , wherein the input memory and the output memory are on-chip memories. 10. An artificial intelligent chip, comprising an apparatus for data processing, the apparatus comprising: at least one input memory, the input memory configured to store to-be-processed data; at least one data conveying component, the data conveying component configured to read an external processing instruction, parse the processing instruction to acquire a read data address, a write data address, and an operating command, read the to-be-processed data from the at least one input memory based on the read data address, process the to-be-processed data based on the operating command to obtain output data and a corresponding write data address after multiplexed processing, and send a write data request, wherein the operating command comprises a data transposition command and a data conveying command, and the write data address corresponding to the output data is obtained by using the write data address obtained from parsing the processing instruction as an initial write address, and migrating the initial write address backward sequentially based on output time sequence of the output data to obtain the write data address corresponding to the output data; at least one multiplexed arbitration component, the multiplexed arbitration component configured to receive, in response to receiving the write data request of the at least one data conveying component, the output data and the corresponding write
from multiple instruction streams, e.g. multistreaming · CPC title
on one IC chip (single chip microcontrollers) · CPC title
Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory · CPC title
of multiple operands or results {(addressing multiple banks G06F12/06)} · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.