Thread-aware cache memory management
US-9223709-B1 · Dec 29, 2015 · US
US2016202989A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016202989-A1 |
| Application number | US-201514594716-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 12, 2015 |
| Priority date | Jan 12, 2015 |
| Publication date | Jul 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.
Opening claim text (preview).
1 . A processor core, comprising: a plurality of dispatch queues for receiving instructions of a corresponding plurality of instruction streams; a plurality of parallel instruction execution slices for executing the plurality of instruction streams in parallel; a dispatch routing network for routing the output of the dispatch queues to the instruction execution slices; a dispatch control logic that dispatches the instructions of the plurality of instruction streams via the dispatch routing network to issue queues of the plurality of parallel instruction execution slices; and a mode control logic, responsive to a mode control signal for reconfiguring a relationship between the plurality of parallel instruction execution slices such that in a first configuration corresponding to a first state of the mode control signal, at least two of the plurality of parallel instruction execution slices are independently operable for executing at least two of the plurality of instruction streams, and wherein in a second configuration corresponding to a second state of the mode control signal the at least two parallel instruction execution slices are linked for executing a single one of the plurality of instruction streams. 2 . The processor core of claim 1 , further comprising: a plurality of cache slices containing mutually-exclusive segments of a lowest-order level of cache memory; and a plurality of load-store slices coupling the plurality of cache slices to the plurality of parallel execution slices for controlling access by the plurality of parallel instruction execution slices to the cache slices, wherein individual ones of the load-store slices are coupled to the at least two parallel execution slices to exchange data with the at least two parallel execution slices, independent of whether the at least two parallel execution slices are in the first configuration or in the second configuration. 3 . The processor core of claim 2 , wherein the load-store units and the cache slices are responsive to the mode control signal such that in the first configuration corresponding to the first state of the mode control signal, at least two of the cache slices are separately partitioned between the at least two parallel instruction execution slices to appear as multiple smaller cache memories with contiguous cache lines, and wherein in the second configuration corresponding to the second state of the mode control signal, the cache slices are combined to appear as larger cache memory that are shared by the at least two parallel instruction execution slices. 4 . The processor core of claim 1 , wherein in the first configuration corresponding to the first state of the mode control signal, the at least two parallel instruction execution slices separately execute first instructions of a first operand width and first operator width of the at least two instruction streams, and wherein in the second configuration corresponding to the second state of the mode control signal the at least two parallel instruction execution slices are linked for executing second instructions of a second operand width that is a multiple of the first operand width or second operator width that is a multiple of the second operator width, the second instructions being instructions of the single instruction stream. 5 . The processor core of claim 4 , wherein the dispatch control logic is responsive to the mode control signal to, in the first configuration corresponding to the first state of the mode control signal, dispatch the first instructions of a first one of the at least two instruction streams to a first one of the at least two parallel instruction execution slices and dispatch the first instructions of a second one of the at least two instruction streams to a second one of the at least two parallel instruction execution slices, and wherein in the second configuration corresponding to the second state of the mode control signal, dispatch the second instructions to one or both of the at least two parallel instruction execution slices as a combined super-slice. 6 . The processor core of claim 1 , further comprising: one or more networks coupling the plurality of execution slices for exchanging results of execution of the plurality of instruction streams; and one or more switches for isolating individual ones of the one or more networks to partition the one or more networks into segments corresponding to sub-groups of the plurality of execution slices. 7 . The processor core of claim 2 , wherein the plurality of parallel instruction execution slices are organized into two or more clusters, and wherein the cache slices are interleave mapped to corresponding different ones of the two or more clusters. 8 - 14 . (canceled) 15 . A computer system, comprising: at least one processor core for executing program instructions of a corresponding plurality of instruction streams; and a memory coupled to the processor core for storing the program instructions, wherein the at least one processor core comprises a plurality of parallel instruction execution slices for executing the plurality of instruction streams in parallel, a dispatch routing network for routing the output of the dispatch queues to the instruction execution slices, a dispatch control logic that dispatches the instructions of the plurality of instruction streams via the dispatch routing network to issue queues of the plurality of parallel instruction execution slices, and a mode control logic, responsive to a mode control signal for reconfiguring a relationship between the plurality of parallel instruction execution slices such that in a first configuration corresponding to a first state of the mode control signal, at least two of the plurality of parallel instruction execution slices are independently operable for executing at least two of the plurality of instruction streams, and wherein in a second configuration corresponding to a second state of the mode control signal the at least two parallel instruction execution slices are linked for executing a single one of the plurality of instruction streams. 16 . The computer system of claim 15 , wherein the processor core further comprises: a plurality of cache slices containing mutually-exclusive segments of a lowest-order level of cache memory; and a plurality of load-store slices coupling the plurality of cache slices to the plurality of parallel execution slices for controlling access by the plurality of parallel instruction execution slices to the cache slices, wherein individual ones of the load-store slices are coupled to the at least two parallel execution slices to exchange data with the at least two parallel execution slices, independent of whether the at least two parallel execution slices are in the first configuration or in the second configuration. 17 . The computer system of claim 15 , wherein in the first configuration corresponding to the first state of the mode control signal, the at least two parallel instruction execution slices separately execute first instructions of a first operand width of the at least two instruction streams, and wherein in the second configuration corresponding to the second state of the mode control signal the at least two parallel instruction execution slices are linked for executing second instructions of a second operand width that is a multiple of the first operand width or second operator width that is a multiple of the second operator width, the second instructions being instructions of the single instruction stream. 18 . The computer system of claim 17 , wherein the dispatch control logic is responsive to the mode control signal to, in the first configuration corresponding
Instruction code · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.