What technology area does this patent fall under?

Primary CPC classification G06F9/3869. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 23 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-channel data path circuitry

US11422822B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11422822-B2
Application number	US-202016870330-A
Country	US
Kind code	B2
Filing date	May 8, 2020
Priority date	May 8, 2020
Publication date	Aug 23, 2022
Grant date	Aug 23, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus, comprising: pipeline circuitry configured to perform operations specified by instructions of first and second single-instruction multiple-data (SIMD) groups assigned to the pipeline circuitry, wherein the pipeline circuitry includes: first front-end circuitry configured to decode instructions of the first assigned SIMD group; second front-end circuitry configured to decode instructions of the second assigned SIMD group, wherein the first and second front-end circuitry are configured to decode an instruction of the first SIMD group and an instruction of the second SIMD group in parallel in a given cycle; shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups; and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle, such that instructions from both the first SIMD group and the second SIMD group occupy different stages of the shared execution circuitry at a given time, wherein the selection is based on at least the following inputs: stall counts for instructions from the first and second front-end circuitry; whether execution units targeted by instructions from the first and second front-end circuitry are saturated; and whether instructions from the first and second front-end circuitry are speculatively executed. 2. The apparatus of claim 1 , wherein the arbitration circuitry is further configured to select an instruction based on one or more of the following types of information: numbers of execution cycles for available instructions; and ages of available instructions. 3. The apparatus of claim 1 , wherein the arbitration circuitry includes first control circuitry configured to select, from among at least a first instruction from the first front-end circuitry and a second instruction from the second front-end circuitry, an instruction that has a smaller stall count. 4. The apparatus of claim 3 , wherein the arbitration circuitry includes second control circuitry configured to, in the absence of a selection by the first control circuitry: based on a determination that the first instruction targets a particular execution unit and the second instruction does not target the particular execution unit, select from among the first and second instructions based on whether the particular execution unit is saturated. 5. The apparatus of claim 4 , wherein the arbitration circuitry includes third control circuitry configured to, in the absence of a selection by the first and second control circuitry: select the first instruction based on a determination that the first instruction is not a speculative instruction and the second instruction is a speculative instruction. 6. The apparatus of claim 5 , wherein the arbitration circuitry is configured to, in the absence of a selection by the first, second, and third control circuitry, select an instruction from an older SIMD group from among the first and second instructions. 7. The apparatus of claim 1 , wherein the first and second front-end circuitry include respective hazard detection stages configured to generate stall counts based on detected hazards. 8. The apparatus of claim 1 , wherein the first and second front-end circuitry include respective operand cache allocation stages; wherein the shared execution circuitry includes an operand cache load stage; and wherein the shared execution circuitry includes an issue stage and a plurality of execution stages. 9. The apparatus of claim 1 , wherein the shared execution circuitry includes a plurality of execution pipelines configured to execute different respective sets of instruction types. 10. The apparatus of claim 1 , further comprising: a central processing unit; a graphics processor; and network interface circuitry; wherein the pipeline circuitry is included in at least one of the central processing unit or the graphics processor. 11. A non-transitory computer readable storage medium having stored thereon design information that specifies a design of at least a portion of a hardware integrated circuit in a format recognized by a semiconductor fabrication system that is configured to use the design information to produce the circuit according to the design, wherein the design information specifies that the circuit includes: pipeline circuitry configured to perform operations specified by instructions of first and second single-instruction multiple-data (SIMD) groups assigned to the pipeline circuitry, wherein the pipeline circuitry includes: first front-end circuitry configured to decode instructions of the first assigned SIMD group; second front-end circuitry configured to decode instructions of the second assigned SIMD group, wherein the first and second front-end circuitry are configured to decode an instruction of the first SIMD group and an instruction of the second SIMD group in parallel in a given cycle; shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups; and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle, such that instructions from both the first SIMD group and the second SIMD group occupy different stages of the shared execution circuitry at a given time, wherein the selection is based on at least the following inputs: stall counts for instructions from the first and second front-end circuitry; whether execution units targeted by instructions from the first and second front-end circuitry are saturated; and whether instructions from the first and second front-end circuitry are speculatively executed. 12. The non-transitory computer readable storage medium of claim 11 , wherein the arbitration circuitry is further configured to select an instruction based on: whether ones of available instructions target a particular portion of the shared execution circuitry; numbers of execution cycles for available instructions; and ages of available instructions. 13. The non-transitory computer readable storage medium of claim 11 , wherein the arbitration circuitry includes first control circuitry configured to select, from among at least a first instruction from the first front-end circuitry and a second instruction from the second front-end circuitry, an instruction that has a smaller stall count; and wherein the arbitration circuitry includes second control circuitry configured to, in the absence of a selection by the first control circuitry and based on a determination that the first instruction targets a particular execution unit and the second instruction does not target the particular execution unit, select from among the first and second instructions based on whether the particular execution unit is saturated. 14. The non-transitory computer readable storage medium of claim 13 , wherein the arbitration circuitry includes third control circuitry configured to, in the absence of a selection by the first and second control circuitry, select the first instruction based on a determination that the first instruction is not a speculative instruction and the second instruction is a speculative instruction; and wherein the arbitration circuitry is configured to, in the absence of a selection by the first, second, and third control circuitry, select an instruction from an older SIMD group from among the first and second instructions. 15. The non-transitory computer

Assignees

Apple Inc

Inventors

Classifications

G06F9/3822
Parallel decoding, e.g. parallel decode units · CPC title
G06T1/20
Processor architectures; Processor configuration, e.g. pipelining · CPC title
G06F9/3869Primary
Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking · CPC title
G06F9/544
Buffers; Shared memory; Pipes · CPC title
G06F15/8007
single instruction multiple data [SIMD] multiprocessors · CPC title

Patent family

Related publications grouped by family.

View patent family 78412638

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11422822B2 cover?: Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry m…
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/3869. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 23 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).