Bit shuffle processors, methods, systems, and instructions

US10713044B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10713044-B2
Application numberUS-201515508284-A
CountryUS
Kind codeB2
Filing dateSep 4, 2015
Priority dateSep 25, 2014
Publication dateJul 14, 2020
Grant dateJul 14, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor includes packed data registers and a decode unit to decode an instruction. The instruction is to indicate a first source operand having at least one lane of bits, and a second source packed data operand having a number of sub-lane sized bit selection elements. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, stores a result operand in a destination storage location. The result operand includes, a different corresponding bit for each of the number of sub-lane sized bit selection elements. A value of each bit of the result operand corresponding to a sub-lane sized bit selection element is that of a bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a plurality of packed data registers; a decode unit to decode an instruction, the instruction to indicate a first source operand that is to have at least one lane of bits, and the instruction to indicate a packed data register that is to store a second source packed data operand that is to have a number of sub-lane sized bit selection elements; and an execution unit coupled with the packed data registers and the decode unit, the execution unit, in response to the instruction, to store a result operand in a destination storage location that is to be indicated by the instruction, the result operand to include, a different corresponding single bit for each of the number of sub-lane sized bit selection elements, a value of each single bit of the result operand corresponding to a sub-lane sized bit selection element to be that of a single bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element, wherein the result operand is to include a plurality of the single bits for each of the at least one lane of bits, and wherein the plurality of the single bits for each of the at least one lane of bits are to be stored in adjacent bit positions. 2. The processor of claim 1 , wherein the number of sub-lane sized bit selection elements include a plurality of subsets that each correspond to a different one of a plurality of lanes of bits, and wherein the execution unit, in response to the instruction, is to use each subset of the sub-lane sized bit selection elements to select bits from within only a corresponding lane of bits. 3. The processor of claim 2 , wherein the execution unit, in response to the instruction, is to store the result operand in a packed data register having the plurality of lanes of bits. 4. The processor of claim 3 , wherein the execution unit, in response to the instruction, is to store the bits selected by each subset of the sub-lane sized bit selection elements in a corresponding lane of bits of the packed data register. 5. The processor of claim 4 , wherein the execution unit, in response to the instruction, is to store at least one replica of the bits selected by each subset of the sub-lane sized bit selection elements in the corresponding lane of bits of the packed data register. 6. The processor of claim 5 , wherein the decode unit is to decode the instruction that is to indicate a source predicate mask operand, and wherein the execution unit, in response to the instruction, is to use the source predicate mask operand to predicate storage of the bits selected by each subset of the sub-lane sized bit selection elements and replicas thereof in the corresponding lane of bits of the packed data register. 7. The processor of claim 1 , wherein each sub-lane sized bit selection element corresponds to a single bit of the result operand in a same relative position, and wherein the second source packed data operand has at least sixteen sub-lane sized bit selection elements. 8. The processor of claim 1 , wherein the execution unit, in response to the instruction, is to store the result operand in the destination storage location which is a packed data operation mask register. 9. The processor of claim 1 , wherein the execution unit, in response to the instruction, is to store the result operand in the destination storage location which is a general-purpose register. 10. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have a single lane of bits, wherein all of the number of sub-lane sized bit selection elements are to correspond to the single lane of bits, and wherein the execution unit, in response to the instruction, is to store a single bit of the single lane of bits to the result operand for each of the number of sub-lane sized bit selection elements. 11. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand is to have a plurality of lanes of bits. 12. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have a single lane of bits, and wherein the processor, in response to the instruction, is to replicate the single lane of bits of the first source operand a plurality of times to create a plurality of lanes of bits. 13. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have at least one 64-bit lane of bits, and is to indicate the second source packed data operand that is to have the number of at least 6-bit sized bit selection elements. 14. The processor of claim 13 , wherein each at least 6-bit bit selection element is in a different corresponding 8-bit byte of the second source packed data operand, and wherein the second source packed data operand has at least sixteen bit selection elements. 15. A processor comprising: a plurality of packed data registers; a decode unit to decode an instruction, the instruction to indicate a first source operand that is to have at least one lane of bits, and the instruction to indicate a packed data register that is to store a second source packed data operand that is to have a same number of sub-lane sized bit selection elements as a number of bits in each of the at least one lane of bits of the first source operand; and an execution unit coupled with the packed data registers and the decode unit, the execution unit, in response to the instruction, to store a result operand in a destination storage location that is to be indicated by the instruction, the result operand to include, a different corresponding single bit for each of the number of sub-lane sized bit selection elements, a value of each single bit of the result operand corresponding to a sub-lane sized bit selection element to be that of a single bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element. 16. A method in a processor comprising: receiving an instruction, the instruction indicating a first source operand having at least one lane of bits, and the instruction having a field specifying a packed data register storing a second source packed data operand having a number of sub-lane sized bit selection elements; and storing a result operand in a destination storage location indicated by the instruction in response to the instruction, the result operand including a different corresponding single bit for each of the number of sub-lane sized bit selection elements, a value of each single bit of the result operand that corresponds to a sub-lane sized bit selection element being that of a single bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, indicated by the corresponding sub-lane sized bit selection element, wherein the result operand includes a plurality of the single bits for each of the at least one lane of bits, and wherein the plurality of the single bits for each of the at least one lane of bits are stored in adjacent bit positions. 17. The method of claim 16 , wherein storing comprises storing the result operand in the destination storage location which is a predicate mask register, and wherein each single bit of the result operand corresponds to a sub-lane sized bit selection element in a same relative position.

Assignees

Inventors

Classifications

  • of compound instructions · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • using a mask · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Bit or string instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10713044B2 cover?
A processor includes packed data registers and a decode unit to decode an instruction. The instruction is to indicate a first source operand having at least one lane of bits, and a second source packed data operand having a number of sub-lane sized bit selection elements. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the inst…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30038. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 14 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).