Reconfigurable parallel execution and load-store slice processor

US2016202989A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016202989-A1
Application numberUS-201514594716-A
CountryUS
Kind codeA1
Filing dateJan 12, 2015
Priority dateJan 12, 2015
Publication dateJul 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.

First claim

Opening claim text (preview).

1 . A processor core, comprising: a plurality of dispatch queues for receiving instructions of a corresponding plurality of instruction streams; a plurality of parallel instruction execution slices for executing the plurality of instruction streams in parallel; a dispatch routing network for routing the output of the dispatch queues to the instruction execution slices; a dispatch control logic that dispatches the instructions of the plurality of instruction streams via the dispatch routing network to issue queues of the plurality of parallel instruction execution slices; and a mode control logic, responsive to a mode control signal for reconfiguring a relationship between the plurality of parallel instruction execution slices such that in a first configuration corresponding to a first state of the mode control signal, at least two of the plurality of parallel instruction execution slices are independently operable for executing at least two of the plurality of instruction streams, and wherein in a second configuration corresponding to a second state of the mode control signal the at least two parallel instruction execution slices are linked for executing a single one of the plurality of instruction streams. 2 . The processor core of claim 1 , further comprising: a plurality of cache slices containing mutually-exclusive segments of a lowest-order level of cache memory; and a plurality of load-store slices coupling the plurality of cache slices to the plurality of parallel execution slices for controlling access by the plurality of parallel instruction execution slices to the cache slices, wherein individual ones of the load-store slices are coupled to the at least two parallel execution slices to exchange data with the at least two parallel execution slices, independent of whether the at least two parallel execution slices are in the first configuration or in the second configuration. 3 . The processor core of claim 2 , wherein the load-store units and the cache slices are responsive to the mode control signal such that in the first configuration corresponding to the first state of the mode control signal, at least two of the cache slices are separately partitioned between the at least two parallel instruction execution slices to appear as multiple smaller cache memories with contiguous cache lines, and wherein in the second configuration corresponding to the second state of the mode control signal, the cache slices are combined to appear as larger cache memory that are shared by the at least two parallel instruction execution slices. 4 . The processor core of claim 1 , wherein in the first configuration corresponding to the first state of the mode control signal, the at least two parallel instruction execution slices separately execute first instructions of a first operand width and first operator width of the at least two instruction streams, and wherein in the second configuration corresponding to the second state of the mode control signal the at least two parallel instruction execution slices are linked for executing second instructions of a second operand width that is a multiple of the first operand width or second operator width that is a multiple of the second operator width, the second instructions being instructions of the single instruction stream. 5 . The processor core of claim 4 , wherein the dispatch control logic is responsive to the mode control signal to, in the first configuration corresponding to the first state of the mode control signal, dispatch the first instructions of a first one of the at least two instruction streams to a first one of the at least two parallel instruction execution slices and dispatch the first instructions of a second one of the at least two instruction streams to a second one of the at least two parallel instruction execution slices, and wherein in the second configuration corresponding to the second state of the mode control signal, dispatch the second instructions to one or both of the at least two parallel instruction execution slices as a combined super-slice. 6 . The processor core of claim 1 , further comprising: one or more networks coupling the plurality of execution slices for exchanging results of execution of the plurality of instruction streams; and one or more switches for isolating individual ones of the one or more networks to partition the one or more networks into segments corresponding to sub-groups of the plurality of execution slices. 7 . The processor core of claim 2 , wherein the plurality of parallel instruction execution slices are organized into two or more clusters, and wherein the cache slices are interleave mapped to corresponding different ones of the two or more clusters. 8 - 14 . (canceled) 15 . A computer system, comprising: at least one processor core for executing program instructions of a corresponding plurality of instruction streams; and a memory coupled to the processor core for storing the program instructions, wherein the at least one processor core comprises a plurality of parallel instruction execution slices for executing the plurality of instruction streams in parallel, a dispatch routing network for routing the output of the dispatch queues to the instruction execution slices, a dispatch control logic that dispatches the instructions of the plurality of instruction streams via the dispatch routing network to issue queues of the plurality of parallel instruction execution slices, and a mode control logic, responsive to a mode control signal for reconfiguring a relationship between the plurality of parallel instruction execution slices such that in a first configuration corresponding to a first state of the mode control signal, at least two of the plurality of parallel instruction execution slices are independently operable for executing at least two of the plurality of instruction streams, and wherein in a second configuration corresponding to a second state of the mode control signal the at least two parallel instruction execution slices are linked for executing a single one of the plurality of instruction streams. 16 . The computer system of claim 15 , wherein the processor core further comprises: a plurality of cache slices containing mutually-exclusive segments of a lowest-order level of cache memory; and a plurality of load-store slices coupling the plurality of cache slices to the plurality of parallel execution slices for controlling access by the plurality of parallel instruction execution slices to the cache slices, wherein individual ones of the load-store slices are coupled to the at least two parallel execution slices to exchange data with the at least two parallel execution slices, independent of whether the at least two parallel execution slices are in the first configuration or in the second configuration. 17 . The computer system of claim 15 , wherein in the first configuration corresponding to the first state of the mode control signal, the at least two parallel instruction execution slices separately execute first instructions of a first operand width of the at least two instruction streams, and wherein in the second configuration corresponding to the second state of the mode control signal the at least two parallel instruction execution slices are linked for executing second instructions of a second operand width that is a multiple of the first operand width or second operator width that is a multiple of the second operator width, the second instructions being instructions of the single instruction stream. 18 . The computer system of claim 17 , wherein the dispatch control logic is responsive to the mode control signal to, in the first configuration corresponding

Assignees

Inventors

Classifications

  • Instruction code · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016202989A1 cover?
A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execu…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).