Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US2016202986A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016202986-A1 |
| Application number | US-201514595635-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 13, 2015 |
| Priority date | Jan 13, 2015 |
| Publication date | Jul 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An execution unit circuit for use in a processor core provides efficient use of area and energy by reducing the per-entry storage requirement of a load-store unit issue queue. The execution unit circuit includes a recirculation queue that stores the effective address of the load and store operations and the values to be stored by the store operations. A queue control logic controls the recirculation queue and issue queue so that that after the effective address of a load or store operation has been computed, the effective address of the load operation or the store operation is written to the recirculation queue and the operation is removed from the issue queue, so that address operands and other values that were in the issue queue entry no longer require storage. When a load or store operation is rejected by the cache unit, it is subsequently reissued from the recirculation queue.
Opening claim text (preview).
1 . An execution unit circuit for a processor core, comprising: an issue queue for receiving a stream of instructions including functional operations and load-store operations; a plurality of internal execution pipelines, including a load-store pipeline for computing effective addresses of load operations and store operations and issuing the load operations and store operations to a cache unit; a recirculation queue for storing entries corresponding to the load operations and the store operations; and control logic for controlling the issue queue, the load-store pipeline and the recirculation queue so that after the load-store pipeline has computed the effective address of a load operation or a store operation, the effective address of the load operation or the store operation is written to the recirculation queue and the load operation or the store operation is removed from the issue queue, the rejected load operation or store operation is subsequently reissued to the cache unit from the recirculation queue. 2 . The execution unit circuit of claim 1 , wherein the recirculation queue stores only the effective address of the load operations and store operations and for store operations, the value to be stored by the store operation. 3 . The execution unit circuit of claim 2 , wherein the control logic removes load operations from the issue queue once the effective address is written to the recirculation queue and removes store operations from the issue queue once the effective address and the values to be stored by the store operations are written to the recirculation queue. 4 . The execution unit circuit of claim 1 , wherein the control logic removes load operations from the issue queue once the effective address is written to the recirculation queue, and wherein the control logic issues the store operations and the values to be stored by the store operations to the cache unit before removing the store data from the issue queue. 5 . The execution unit circuit of claim 1 , wherein the control logic issues the load operations and store operations to the cache unit in the same processor cycle as the effective address of the load operations and the store operations are written to the recirculation queue. 6 . The execution unit circuit of claim 1 , wherein the cache unit is implemented as a plurality of cache slices to which the load operations and the store operations are routed via a bus, and wherein the reissue of the rejected load operation or store operations is directed to a different cache slice than another cache slice that has previously rejected the rejected load operation or store operation. 7 . The execution unit circuit of claim 1 , wherein the control logic halts the issue of load instructions and store instructions from the issue queue when the recirculation queue is full. 8 . A processor core, comprising: a plurality of dispatch queues for receiving instructions of a corresponding plurality of instruction streams; a dispatch routing network for routing the output of the dispatch queues to the instruction execution slices; a dispatch control logic that dispatches the instructions of the plurality of instruction streams via the dispatch routing network to issue queues of the plurality of parallel instruction execution slices; and a plurality of parallel instruction execution slices for executing the plurality of instruction streams in parallel, wherein the instruction execution slices comprise an issue queue for receiving a stream of instructions including functional operations and load-store operations, a plurality of internal execution pipelines, including a load-store pipeline for computing effective addresses of load operations and store operations and issuing the load operations and store operations to a cache unit, a recirculation queue for storing entries corresponding to the load operations and the store operations, and queue control logic for controlling the issue queue, the load-store pipeline and the recirculation queue so that after the load-store pipeline has computed the effective address of a load operation or a store operation, the effective address of the load operation or the store operation is written to the recirculation queue and the load operation or the store operation is removed from the issue queue, wherein if one of the load operations or store operations is rejected by the cache unit, the rejected load operation or store operation is subsequently reissued to the cache unit from the recirculation queue. 9 . The processor core of claim 8 , wherein the recirculation queue stores only the effective addresses of the load operations or store operations and for store operations, the values to be stored by the store operations. 10 . The processor core of claim 9 , wherein the queue control logic removes load operations from the issue queue once the effective address is written to the recirculation queue and removes store operations from the issue queue once the effective address and the values to be stored by the store operations are written to the recirculation queue. 11 . The processor core of claim 8 , wherein the queue control logic removes load operations from the issue queue once the effective address is written to the recirculation queue, and wherein the queue control logic issues the store operations and the values to be stored by the store operations to the cache unit before removing the store data from the issue queue. 12 . The processor core of claim 8 , wherein the queue control logic issues the load operations or store operations to the cache unit in the same processor cycle as the effective address of the load operations and store operations are written to the recirculation queue. 13 . The processor core of claim 8 , wherein the processor core further comprises a plurality of cache slices to which the load and store operations are routed via a bus and that implements the cache unit, and wherein the reissue of the rejected load operation or store operation is directed to a different cache slice than another cache slice that has previously rejected the rejected load operation or store operation. 14 . The processor core of claim 8 , wherein the queue control logic halts the issue of load instructions and store instructions from the issue queue when the recirculation queue is full. 15 - 20 . (canceled)
Hit rate improvement · CPC title
Details relating to cache mapping · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.