Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US2016266902A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016266902-A1 |
| Application number | US-201113977736-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 16, 2011 |
| Priority date | Dec 16, 2011 |
| Publication date | Sep 15, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Instructions and logic provide vector linear interpolation functionality. In some embodiments, responsive to an instruction specifying: a first operand from a set of vector registers, a size of each of the vector elements, a portion of the vector elements upon which to compute linear interpolations, a second operand from a set of vector registers, and a third operand; an execution unit, reads a first, a second and a third value of the size of vector elements from corresponding data fields in the first, the second and the third operand respectively and computes an interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value.
Opening claim text (preview).
What is claimed is: 1 . A processor comprising: one or more vector registers each comprising a first plurality of data fields to store values of vector elements; a decode stage to decode a first instruction specifying: a first operand of the one or more vector registers, a size of the vector elements, a portion of the first plurality of data fields, a second operand of the one or more vector registers, and a third operand; and an execution unit, responsive to the decoded first instruction and for each data field of the portion of the first plurality of data fields, to: read a first, a second and a third value respectively from corresponding data fields of the size of vector elements in the first, the second and the third operand; and compute in parallel an interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value. 2 . The processor of claim 1 , wherein the execution unit, responsive to the decoded first instruction and for each data field of the portion of the first plurality of data fields, is further to store each computed interpolated value respectively in corresponding data fields of the size of vector elements in a destination operand specified by the first instruction. 3 . The processor of claim 1 , wherein the specified third operand is one of the one or more vector registers. 4 . The processor of claim 1 , wherein the specified third operand is a memory location. 5 . The processor of claim 1 , the first instruction specifying a mask identifying the portion of the first plurality of data fields, wherein said values read from data fields in the source operands corresponds to vector elements in the source operands unmasked by the mask specified by the first instruction. 6 . The processor of claim 1 , wherein the size of vector elements is 64-bits. 7 . The processor of claim 1 , the execution unit to: read a fourth, a fifth and a sixth value respectively from corresponding data fields of the size of vector elements in the first, the second and the third operand; and in parallel with computing the interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value, compute another interpolated value as the fourth value multiplied by the fifth value minus the fifth value multiplied by the sixth value plus the sixth value. 8 . The processor of claim 7 , wherein the size of vector elements is 32-bits. 9 . The processor of claim 7 , wherein the size of vector elements is 16-bits. 10 . A machine implemented method comprising: reading a first, a second and a third value respectively from corresponding data fields of a size of vector elements specified by the first executable instruction, in a first, a second and a third operand specified by the first executable instruction; and computing a first interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value. 11 . The machine implemented method of claim 10 , wherein the size of vector elements is 64-bits. 12 . The machine implemented method of claim 10 , comprising: reading a fourth, a fifth and a sixth value respectively from corresponding data fields of the size of vector elements in the first, the second and the third operand; and in parallel with computing the first interpolated value, computing a second interpolated value as the fourth value multiplied by the fifth value minus the fifth value multiplied by the sixth value plus the sixth value. 13 . The machine implemented method of claim 12 , wherein the size of vector elements is 32-bits. 14 . The machine implemented method of claim 12 , wherein the size of vector elements is 16-bits. 15 . The machine implemented method of claim 10 , wherein the specified third operand is one of a set of vector registers. 16 . The machine implemented method of claim 10 , wherein the specified third operand is a memory location. 17 . The machine implemented method of claim 10 , the first executable instruction specifying a mask identifying a portion of the data fields of the size of vector elements, wherein said values read from corresponding data fields in the first, second and third operands corresponds to vector elements in the first, second and third operands unmasked by the mask specified by the first executable instruction. 18 . An apparatus comprising: a register/memory access logic to read a first, a second and a third value respectively from corresponding data fields of a size of vector elements specified by a first executable instruction, in a first, a second and a third operand specified by the first executable instruction; and a partitionable multiplication/addition logic to compute a first interpolated value by first multiplying in parallel the first value by the second value to generate a first set of partial products and the second value by the third value to generate a second set of negated partial products, then adding together the first set of partial products plus the second set of negated partial products plus the third value. 19 . The apparatus of claim 18 to store the first interpolated value to a corresponding data field of the size of vector elements in a destination operand specified by the first executable instruction. 20 . The apparatus of claim 19 wherein the destination operand specified by the first executable instruction is a vector register different than the first, the second and the third operand specified by the first executable instruction. 21 . The apparatus of claim 18 , said register/memory access logic to read a fourth, a fifth and a sixth value respectively from corresponding data fields of the size of vector elements in the first, the second and the third operand specified by the first executable instruction; and said partitionable multiplication/addition logic to compute a second interpolated value by multiplying, in parallel with the first multiplying, the fourth value by the fifth value to generate a third set of partial products and the fifth value by the sixth value to generate a fourth set of negated partial products, then adding together the third set of partial products plus the fourth set of negated partial products plus the sixth value. 22 . The apparatus of claim 21 wherein corresponding data fields being read for the third and sixth values were converted up to the size of vector elements specified by the first executable instruction. 23 . The apparatus of claim 22 wherein corresponding data fields being read for the third and sixth values were broadcast from the third operand specified by the first executable instruction. 24 . The apparatus of claim 22 wherein corresponding data fields being read for the third and sixth values were swizzled from the third operand specified by the first executable instruction. 25 . A processing system comprising: a memory; and a first plurality of processors, each of the first plurality of processors comprising: one or more vector registers each comprising a first plurality of data fields to store values of vector elements; a decode stage to decode a first instruction specifying: a first operand of the one or more vector registers, a size of the vector elements, a portion of the first plurality of data fields, a second operand of the one or mor
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Decoding the operand specifier, e.g. specifier format · CPC title
using a mask · CPC title
having multiple operands in a single register · CPC title
according to one or more bits in the instruction, e.g. prefix, sub-opcode · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.