Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US10127042B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10127042-B2 |
| Application number | US-201615396578-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 31, 2016 |
| Priority date | Jun 26, 2013 |
| Publication date | Nov 13, 2018 |
| Grant date | Nov 13, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a memory; a processor coupled with the memory, the processor comprising: a plurality of 128-bit single instruction, multiple data (SIMD) registers; a decode unit coupled to the instruction fetch unit, the decode unit to decode instructions, including a Secure Hash Algorithm (SHA) 256 schedule instruction, the SHA256 schedule instruction having: a first field to specify a first 128-bit SIMD source register of the 128-bit SIMD registers, the first 128-bit SIMD source register to store a first operand that is to include a first 32-bit data element in bits [ 31 : 0 ], a second 32-bit data element in bits [ 63 : 32 ], a third 32-bit data element in bits [ 95 : 64 ], and a fourth 32-bit data element in bits [ 127 : 96 ]; a second field to specify a second 128-bit SIMD source register of the 128-bit SIMD registers, the second 128-bit SIMD source register to store a second operand that is to include a fifth 32-bit data element in bits [ 31 : 0 ], a sixth 32-bit data element in bits [ 63 : 32 ], a seventh 32-bit data element in bits [ 95 : 64 ], and an eighth 32-bit data element in bits [ 127 : 96 ]; and a third field to specify a third 128-bit SIMD source register of the 128-bit SIMD registers, the third 128-bit SIMD source register to store a third operand that is to include a ninth 32-bit data element in bits [ 31 : 0 ], a tenth 32-bit data element in bits [ 63 : 32 ], an eleventh 32-bit data element in bits [ 95 : 64 ], and a twelfth 32-bit data element in bits [ 127 : 96 ]; and an execution unit coupled to the decode unit, and coupled to the 128-bit SIMD registers, the execution unit to execute the SHA256 schedule instruction, and to store a result that is to include: a first 32-bit result data element in bits [ 31 : 0 ] that is to be equal to a sum of: (a) a value equal to, the eleventh 32-bit data element rotated right by seventeen bits, and exclusive-ORed with the eleventh 32-bit data element rotated right by nineteen bits, and exclusive-ORed with the eleventh 32-bit data element shifted right by ten bits; (b) the first 32-bit data element; and (c) the sixth 32-bit data element; a second 32-bit result data element in bits [ 63 : 32 ] that is to be equal to a sum of: (a) a value equal to, the twelfth 32-bit data element rotated right by seventeen bits, and exclusive-ORed with the twelfth 32-bit data element rotated right by nineteen bits, and exclusive-ORed with the twelfth 32-bit data element shifted right by ten bits; (b) the second 32-bit data element; and (c) the seventh 32-bit data element; a third 32-bit result data element in bits [ 95 : 64 ], wherein a first value is to be equal to the first 32-bit result data element, the third 32-bit result data element to be equal to a sum of: (a) a value equal to, the first value rotated right by seventeen bits, and exclusive-ORed with the first value rotated right by nineteen bits, and exclusive-ORed with the first value shifted right by ten bits; (b) the third 32-bit data element; and (c) the eighth 32-bit data element; and a fourth 32-bit result data element in bits [ 127 : 96 ], wherein a second value is to be equal to the second 32-bit result data element, the fourth 32-bit result data element to be equal to a sum of: (a) a value equal to, the second value rotated right by seventeen bits, and exclusive-ORed with the second value rotated right by nineteen bits, and exclusive-ORed with the second value shifted right by ten bits; (b) the fourth 32-bit data element; and (c) the ninth 32-bit data element. 2. The system of claim 1 , wherein the decode unit is to decode a second SHA 256 schedule instruction to be used to perform another part of SHA 256 scheduling. 3. The system of claim 1 , wherein the first 128-bit SIMD source register is also to be used as a destination register to store the result. 4. The system of claim 1 , wherein the processor is a reduced instruction set computing (RISC) processor. 5. The system of claim 1 , wherein the processor further comprises: a plurality of 64-bit general-purpose registers; a data cache; an instruction cache; a branch prediction unit; an instruction translation lookaside buffer (TLB) coupled to the instruction cache; and an instruction fetch unit coupled to the decode unit. 6. The system of claim 5 , further comprising a level 2 (L2) cache coupled to the data cache and coupled to the instruction cache. 7. The system of claim 1 , wherein the processor further comprises a reorder buffer. 8. The system of claim 1 , wherein the processor further comprises a register rename unit. 9. The system of claim 1 , further comprising audio I/O coupled with the processor. 10. The system of claim 1 , further comprising a graphics processing unit (GPU) coupled with the processor. 11. The system of claim 1 , further comprising a communication processor coupled with the processor. 12. The system of claim 1 , further comprising a display coupled with the processor. 13. The system of claim 1 , further comprising a Peripheral Component Interconnect (PCI) express interconnect coupled with the processor. 14. The system of claim 1 , wherein the system comprises a cell phone.
single instruction multiple data [SIMD] multiprocessors · CPC title
Instruction prefetching · CPC title
Providing cryptographic facilities or services · CPC title
using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title
using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.