What technology area does this patent fall under?

Primary CPC classification G06F9/30018. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Compressing execution cycles for divergent execution in a single instruction multiple data (SIMD) processor

US9606797B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9606797-B2
Application number	US-201213724633-A
Country	US
Kind code	B2
Filing date	Dec 21, 2012
Priority date	Dec 21, 2012
Publication date	Mar 28, 2017
Grant date	Mar 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: an execution unit having a data path including a plurality of lanes, each of the lanes to execute an operation on at least one channel of a plurality of channels of a single instruction multiple data (SIMD) instruction responsive to the SIMD instruction, the execution unit having a plurality of quadrants and to perform the SIMD instruction in a number of execution cycles; and a decode logic including compaction circuitry to calculate a minimum number of execution cycles to execute the SIMD instruction based on an active lane count, compare the minimum number of execution cycles to an active quadrant value, and based on the comparison, compact the number of execution cycles, including permutation of at least some of the plurality of channels of the SIMD instruction, wherein a number of permutations between the quadrants is minimized by the compaction circuitry, to reduce the number of execution cycles for execution of the SIMD instruction based at least in part on the calculation and an execution mask associated with the SIMD instruction, the execution mask based at least in part on an instruction predicate mask, a dispatch mask and a conditional mask. 2. The processor of claim 1 , wherein the compaction circuitry is to reduce the number of execution cycles for execution of the SIMD instruction by at least one execution cycle when the execution mask indicates that a set of channels of the SIMD instruction to be issued to the execution unit during the at least one execution cycle are to be unused. 3. The processor of claim 2 , wherein the compaction circuitry is to cause a next set of channels of the SIMD instruction to be inserted into the at least one execution cycle. 4. The processor of claim 2 , wherein the execution unit is to execute the SIMD instruction in a first number of execution cycles less than the number of execution cycles as a result of reduction of the number of execution cycles by the at least one execution cycle. 5. The processor of claim 1 , further comprising permute circuitry coupled to the execution unit to permute at least some of the plurality of channels of the SIMD instruction prior to input to the execution unit, responsive to control information from the compaction circuitry. 6. The processor of claim 5 , wherein a first portion of the plurality of channels obtained from the permutation are to be sent to the execution unit, and a second portion of the plurality of channels obtained from the permutation are not to be sent to the execution unit. 7. The processor of claim 1 , wherein the SIMD instruction is of a first path of a conditional block. 8. The processor of claim 1 , wherein the SIMD instruction is of a variable width SIMD instruction set architecture. 9. The processor of claim 1 , further comprising a split register file having a first set of half registers each to store a first plurality of channels of a SIMD instruction and a second set of half registers each to store a second plurality of channels of the SIMD instruction. 10. The processor of claim 1 , further comprising: a register file having a plurality of registers each to store a plurality of channels of a SIMD instruction; a latch to receive an operand from a register of the register file; permute circuitry coupled to the latch to receive the operand from the latch and control information from the decode logic and to permute at least portions of the operand; and an output logic coupled to the permute circuitry and including a plurality of switches, wherein a corresponding switch is to be enabled by the compaction circuitry to provide a corresponding portion of the permuted operand to the execution unit. 11. A non-transitory machine-readable medium having stored thereon instructions, which when performed by a machine cause the machine to perform a method comprising: receiving a single instruction multiple data (SIMD) instruction and information associated with the SIMD instruction in a SIMD execution unit of a processor, the SIMD instruction having a plurality of channels that are to consume a first plurality of execution cycles, the SIMD execution unit having a plurality of quadrants; identifying a first portion of the plurality of channels of the SIMD instruction that are to be disabled; calculating a minimum number of execution cycles to execute the SIMD instruction based on an active lane count, comparing the minimum number of execution cycles to an active quadrant value, and based on the comparing, compacting the first plurality of execution cycles, including permuting at least some of the plurality of channels of the SIMD instruction, wherein a number of permutations between the quadrants is minimized; removing one or more execution cycles of the first plurality of execution cycles for executing the SIMD instruction based on the calculating; and after the removing, executing the SIMD instruction in fewer execution cycles than the first plurality of execution cycles. 12. The non-transitory machine-readable medium of claim 11 , wherein the method further comprises inserting a second portion of the plurality of channels of the SIMD instruction into a first removed execution cycle. 13. The non-transitory machine-readable medium of claim 11 , wherein the method further comprises inserting a second portion of a plurality of channels of a second SIMD instruction into a first removed execution cycle. 14. The non-transitory machine-readable medium of claim 13 , wherein the SIMD instruction is of a first branch of a conditional operation and the second SIMD instruction is of a second branch of the conditional operation. 15. The non-transitory machine-readable medium of claim 11 , wherein the method further comprises permuting the at least some of the plurality of channels of the SIMD instruction, and thereafter identifying the first portion of the plurality of channels of the SIMD instruction that are to be disabled. 16. A system comprising: a processor comprising: a core domain including a plurality of cores to independently execute instructions; and a graphics domain including a plurality of graphics processors to perform general purpose workloads offloaded by the core domain, each of the graphics processors having a vector execution unit including a plurality of lanes each to execute an operation on at least one data element of a plurality of data elements identified by a vector instruction, the vector execution unit to perform the vector instruction on the plurality of data elements in a first number of execution cycles, and cycle compression circuitry coupled to the vector execution unit to reduce the first number of execution cycles based at least in part on an execution mask associated with the vector instruction, the execution mask based at least in part on an instruction predicate mask, a dispatch mask and a conditional mask, permute circuitry having an output coupled to an input to the vector execution unit to permute at least some of the plurality of data elements prior to input to the vector execution unit, responsive to control information from the cycle compression circuitry, and unpermute circuitry having an input coupled to an output of the vector execution unit to unpermute at least some of the plurality of data elements after output from the vector execution unit, responsive to control information from the cycle compression circuitry; and a dynamic random access memory (DRAM) coupled to the processor. 17. The system of claim 16 , wherein the cycle compression circuitry is to cause permutation of a first data element in a

Assignees

Intel Corp

Inventors

Classifications

G06F9/30018Primary
Bit or string instructions · CPC title
G06F9/30058
Conditional branch instructions · CPC title
G06F9/30032
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
G06F9/30072
to perform conditional operations, e.g. using predicates or guards · CPC title
G06F9/3887
controlled by a single instruction for multiple data lanes [SIMD] · CPC title

Patent family

Related publications grouped by family.

View patent family 50976101

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9606797B2 cover?: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of exec…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/30018. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).