Parallel slice processor having a recirculating load-store queue for fast deallocation of issue queue entries

US2016202986A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016202986-A1
Application numberUS-201514595635-A
CountryUS
Kind codeA1
Filing dateJan 13, 2015
Priority dateJan 13, 2015
Publication dateJul 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An execution unit circuit for use in a processor core provides efficient use of area and energy by reducing the per-entry storage requirement of a load-store unit issue queue. The execution unit circuit includes a recirculation queue that stores the effective address of the load and store operations and the values to be stored by the store operations. A queue control logic controls the recirculation queue and issue queue so that that after the effective address of a load or store operation has been computed, the effective address of the load operation or the store operation is written to the recirculation queue and the operation is removed from the issue queue, so that address operands and other values that were in the issue queue entry no longer require storage. When a load or store operation is rejected by the cache unit, it is subsequently reissued from the recirculation queue.

First claim

Opening claim text (preview).

1 . An execution unit circuit for a processor core, comprising: an issue queue for receiving a stream of instructions including functional operations and load-store operations; a plurality of internal execution pipelines, including a load-store pipeline for computing effective addresses of load operations and store operations and issuing the load operations and store operations to a cache unit; a recirculation queue for storing entries corresponding to the load operations and the store operations; and control logic for controlling the issue queue, the load-store pipeline and the recirculation queue so that after the load-store pipeline has computed the effective address of a load operation or a store operation, the effective address of the load operation or the store operation is written to the recirculation queue and the load operation or the store operation is removed from the issue queue, the rejected load operation or store operation is subsequently reissued to the cache unit from the recirculation queue. 2 . The execution unit circuit of claim 1 , wherein the recirculation queue stores only the effective address of the load operations and store operations and for store operations, the value to be stored by the store operation. 3 . The execution unit circuit of claim 2 , wherein the control logic removes load operations from the issue queue once the effective address is written to the recirculation queue and removes store operations from the issue queue once the effective address and the values to be stored by the store operations are written to the recirculation queue. 4 . The execution unit circuit of claim 1 , wherein the control logic removes load operations from the issue queue once the effective address is written to the recirculation queue, and wherein the control logic issues the store operations and the values to be stored by the store operations to the cache unit before removing the store data from the issue queue. 5 . The execution unit circuit of claim 1 , wherein the control logic issues the load operations and store operations to the cache unit in the same processor cycle as the effective address of the load operations and the store operations are written to the recirculation queue. 6 . The execution unit circuit of claim 1 , wherein the cache unit is implemented as a plurality of cache slices to which the load operations and the store operations are routed via a bus, and wherein the reissue of the rejected load operation or store operations is directed to a different cache slice than another cache slice that has previously rejected the rejected load operation or store operation. 7 . The execution unit circuit of claim 1 , wherein the control logic halts the issue of load instructions and store instructions from the issue queue when the recirculation queue is full. 8 . A processor core, comprising: a plurality of dispatch queues for receiving instructions of a corresponding plurality of instruction streams; a dispatch routing network for routing the output of the dispatch queues to the instruction execution slices; a dispatch control logic that dispatches the instructions of the plurality of instruction streams via the dispatch routing network to issue queues of the plurality of parallel instruction execution slices; and a plurality of parallel instruction execution slices for executing the plurality of instruction streams in parallel, wherein the instruction execution slices comprise an issue queue for receiving a stream of instructions including functional operations and load-store operations, a plurality of internal execution pipelines, including a load-store pipeline for computing effective addresses of load operations and store operations and issuing the load operations and store operations to a cache unit, a recirculation queue for storing entries corresponding to the load operations and the store operations, and queue control logic for controlling the issue queue, the load-store pipeline and the recirculation queue so that after the load-store pipeline has computed the effective address of a load operation or a store operation, the effective address of the load operation or the store operation is written to the recirculation queue and the load operation or the store operation is removed from the issue queue, wherein if one of the load operations or store operations is rejected by the cache unit, the rejected load operation or store operation is subsequently reissued to the cache unit from the recirculation queue. 9 . The processor core of claim 8 , wherein the recirculation queue stores only the effective addresses of the load operations or store operations and for store operations, the values to be stored by the store operations. 10 . The processor core of claim 9 , wherein the queue control logic removes load operations from the issue queue once the effective address is written to the recirculation queue and removes store operations from the issue queue once the effective address and the values to be stored by the store operations are written to the recirculation queue. 11 . The processor core of claim 8 , wherein the queue control logic removes load operations from the issue queue once the effective address is written to the recirculation queue, and wherein the queue control logic issues the store operations and the values to be stored by the store operations to the cache unit before removing the store data from the issue queue. 12 . The processor core of claim 8 , wherein the queue control logic issues the load operations or store operations to the cache unit in the same processor cycle as the effective address of the load operations and store operations are written to the recirculation queue. 13 . The processor core of claim 8 , wherein the processor core further comprises a plurality of cache slices to which the load and store operations are routed via a bus and that implements the cache unit, and wherein the reissue of the rejected load operation or store operation is directed to a different cache slice than another cache slice that has previously rejected the rejected load operation or store operation. 14 . The processor core of claim 8 , wherein the queue control logic halts the issue of load instructions and store instructions from the issue queue when the recirculation queue is full. 15 - 20 . (canceled)

Assignees

Inventors

Classifications

  • Hit rate improvement · CPC title

  • Details relating to cache mapping · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016202986A1 cover?
An execution unit circuit for use in a processor core provides efficient use of area and energy by reducing the per-entry storage requirement of a load-store unit issue queue. The execution unit circuit includes a recirculation queue that stores the effective address of the load and store operations and the values to be stored by the store operations. A queue control logic controls the recircul…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F9/3802. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).