Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US2016011870A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016011870-A1 |
| Application number | US-201414327534-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 9, 2014 |
| Priority date | Jul 9, 2014 |
| Publication date | Jan 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction format of the instruction specifies a first input vector, a second input vector and a third input operand. The instruction execution pipeline comprises an instruction decode stage to decode the instruction. The instruction execution pipeline includes a functional unit to execute the instruction. The functional unit includes a routing network to route a first contiguous group of elements from a first end of one of the input vectors to a second end of the instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of the input vectors to a first end of the instruction's resultant vector. The first and second ends are opposite vector ends. The first and second groups of contiguous elements are defined from the third input operand. The instruction is not capable of routing non-contiguous groups of elements from the input vectors to the instruction's resultant vector. A software pipeline that uses the instruction is also described
Opening claim text (preview).
1 . A processor, comprising: an instruction execution pipeline comprising: an instruction fetch stage to fetch an instruction, the instruction format of the instruction specifying a first input vector, a second input vector and a third input operand; an instruction decode stage to decode said instruction; a functional unit to execute the instruction, the functional unit including a routing network to route a first contiguous group of elements from a first end of one of said input vectors to a second end of said instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of said input vectors to a first end of said instruction's resultant vector, said first and second ends being opposite vector ends, wherein, said first and second groups of contiguous elements are defined from said third input operand, said instruction not capable of routing non-contiguous groups of elements from said input vectors to said instruction's resultant vector. 2 . The processor of claim 1 wherein said third input operand is specified as a scalar. 3 . The processor of claim 1 wherein said third input operand is embodied with a mask vector. 4 . The processor of claim 1 wherein said first end is a left end and said second end is a right end. 5 . The processor of claim 1 wherein said first end is a right end and said second end is a left end. 6 . The processor of claim 1 wherein said instruction execution pipeline comprises a second functional unit to execute a second instruction, said second functional unit including a routing network to route a first contiguous group of elements from the second end of a first input vector to the first end of said instruction's resultant vector, and, route a second contiguous group of elements from the first end of a second input vector to the second end of the second instruction's resultant vector, wherein, said first and second groups of contiguous elements are defined from a third input operand of said second instruction. 7 . The processor of claim 6 wherein the first and second functional units are the same functional unit. 8 . A machine readable medium containing program code stored therein that when processed by a computing system causes a method to be performed by a compiler, said method comprising: detecting processing of an array having misaligned data rows; compiling said processing of said array into a software pipelined loop program code sequence having an instruction whose instruction format specifies a first input vector, a second input vector and a third input operand, said instruction to route a first contiguous group of elements from a first end of one of said input vectors to a second end of said instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of said input vectors to a first end of said instruction's resultant vector, said first and second ends being opposite vector ends, wherein, said first and second groups of contiguous elements are defined from said third input operand, and wherein, said instruction is not capable of routing non-contiguous groups of elements from said input vectors to said instruction's resultant vector. 9 . The machine readable medium of claim 8 wherein peeling is not used in formulating said program code sequence. 10 . The machine readable medium of claim 8 wherein said instruction's resultant vector is an aligned row of said array. 11 . The machine readable medium of claim 10 wherein said program code sequence includes code to process said aligned row. 12 . The machine readable medium of claim 10 wherein said instruction's resultant vector includes sections of two different rows of said array. 13 . The machine readable medium of claim 10 wherein the software pipelined loop on a per cycle basis accepts as an input a next vector having a leading section of a third row and trailing section of second row in an array and wherein the instruction accepts the next vector and a previous cycle's input vector as said first and second input vectors, the previous cycle's vector having a leading section of said second vector and trailing section of a first row in the array. 14 . The machine readable medium of claim 13 wherein memory accesses for the rows are not aligned with row boundaries. 15 . The machine readable medium of claim 10 wherein the software pipelined loop on a per cycle basis writes a next output vector having a trailing section of a first row and a leading section of a second row, wherein the instruction accepts as said first and second input vectors results of processing performed on the first and second rows. 16 . The machine readable medium of claim 15 wherein memory accesses for the rows are not aligned with row boundaries. 17 . A computing system, comprising: a system memory; a processor coupled to said system memory, said processor comprising an instruction execution pipeline comprising: an instruction fetch stage to fetch an instruction, the instruction format of the instruction specifying a first input vector, a second input vector and a third input operand; an instruction decode stage to decode said instruction; a functional unit to execute the instruction, the functional unit including a routing network to route a first contiguous group of elements from a first end of one of said input vectors to a second end of said instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of said input vectors to a first end of said instruction's resultant vector, said first and second ends being opposite vector ends, wherein, said first and second groups of contiguous elements are defined from said third input operand, said instruction not capable of routing non-contiguous groups of elements from said input vectors to said instruction's resultant vector. 18 . The computing system of claim 17 wherein said third input operand is specified as a scalar. 19 . The computing system of claim 17 wherein said third input operand is embodied with a mask vector. 20 . The computing system of claim 17 wherein said first end is a left end and said second end is a right end. 21 . The computing system of claim 17 wherein said first end is a right end and said second end is a left end. 22 . The computing system of claim 17 wherein said system memory contains compiled code to process an array having misaligned data rows wherein aligned accesses are made to said system memory to process said array's data.
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Instruction prefetching · CPC title
using a mask · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.