Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9361100B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9361100-B2 |
| Application number | US-201213730845-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2012 |
| Priority date | Dec 2, 1994 |
| Publication date | Jun 7, 2016 |
| Grant date | Jun 7, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor includes a first register with first, second, third, and fourth data elements. A second register to hold fifth, sixth, seventh, and eighth data elements, and a third register. A decoder to decode a packed instruction to identify the first and second registers as source registers and the third register as a destination register. And to decode a pack instruction to identify a fourth and a fifth register each having 16-bit data elements. At least one functional unit, responsive to the packed instruction, to store a result in the third register including only half of all data elements of each of the first and second registers, including only corresponding data elements from corresponding positions in the first and second registers, and responsive to the pack instruction to store a result that is to include an 8-bit data element for each 16-bit data element in the fourth and fifth registers.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a register file including at least: a first register to hold a first data element, a second data element, a third data element, and a fourth data element; a second register to hold a fifth data element, a sixth data element, a seventh data element, and an eighth data element; and a third register; a decoder to decode a packed instruction, the packed instruction to identify the first and the second registers as source registers and the third register as a destination register, the decoder also to decode a pack instruction that is to identify a fourth register that is to have a plurality of 16-bit data elements and a fifth register that is to have a plurality of 16-bit data elements; at least one functional unit coupled to the register file and the decoder, the at least one functional unit, responsive to the packed instruction, to store a result in the third register that is to include only half of all data elements of the first register and only half of all data elements of the second register, the result to include only corresponding data elements that are to be from corresponding data element positions in the first and second registers, and the at least one functional unit, responsive to the pack instruction to store a second result that is to include an 8-bit data element for each 16-bit data element in the fourth and fifth registers, wherein each 8-bit data element in the second result is to include, depending on a value of the corresponding 16-bit data element, one of a least significant 8-bits of the corresponding 16-bit data element and a saturation value. 2. The processor of claim 1 , wherein the first, second, and third registers are to hold 32-bits, and wherein each of the first through eighth data elements are to be 8 bits in size. 3. The processor of claim 1 , wherein the first, second, and third registers are to hold 64-bits, and wherein each of the first through eighth data elements are to be 16 bits in size. 4. The processor of claim 1 , wherein the first, second, and third registers are to hold 128-bits, and wherein each of the first through eighth data elements are to be 32 bits in size. 5. The processor of claim 1 , wherein the processor is operable to perform a plurality of other packed data instructions, including at least a packed data addition instruction, a packed data subtraction instruction, and a packed data multiplication instruction. 6. The processor of claim 5 , wherein the processor is further operable to perform a packed data shift instruction, and a packed data compare instruction. 7. A system comprising: communications hardware that is operable to couple; a display; a microphone; and a speaker; and a processor coupled with the communications hardware, the processor comprising: a register file including at least: a first register to hold a first data element, a second data element, a third data element, and a fourth data element; a second register to hold a fifth data element, a sixth data element, a seventh data element, and an eighth data element, wherein the second and sixth data elements are in corresponding positions and the fourth and eighth data elements are in corresponding positions; and a third register; a decoder to decode a packed instruction, the packed instruction to identify the first and the second registers as source registers and the third register as a destination register, the decoder also to decode a pack instruction that is to identify a fourth register that is to have a plurality of 16-bit data elements and a fifth register that is to have a plurality of 16-bit data elements; at least one functional unit coupled to the register file and the decoder, the at least one functional unit, responsive to the packed instruction, to store a result in the third register that is to include only half of all data elements of the first register and only half of all data elements of the second register, the result to include only corresponding data elements that are to be from corresponding data element positions in the first and second registers, and the at least one functional unit, responsive to the pack instruction to store a second result that is to include an 8-bit data element for each 16-bit data element in the fourth and fifth registers, wherein each 8-bit data element in the second result is to include, depending on a value of the corresponding 16-bit data element, one of a least significant 8-bits of the corresponding 16-bit data element and a saturation value. 8. The system of claim 7 , wherein the first, second, and third registers are to hold 32-bits, and wherein each of the first through eighth data elements are to be 8 bits in size. 9. The system of claim 7 , wherein the first, second, and third registers are to hold 64-bits, and wherein each of the first through eighth data elements are to be 16 bits in size. 10. The system of claim 7 , wherein the first, second, and third registers are to hold 128-bits, and wherein each of the first through eighth data elements are to be 32 bits in size. 11. The system of claim 7 , wherein the processor is operable to perform a plurality of other packed data instructions, including at least a packed data addition instruction, a packed data subtraction instruction, and a packed data multiplication instruction. 12. The system of claim 11 , wherein the processor is further operable to perform a packed data shift instruction, and a packed data compare instruction. 13. The system of claim 7 , wherein the display is to comprise a touch screen display. 14. The system of claim 7 , wherein the communications hardware is further operable to couple a video digitizing device, the video digitizing device to capture video images. 15. A method comprising: holding a first data element, a second data element, a third data element, and a fourth data element in a first register; holding a fifth data element, a sixth data element, a seventh data element, and an eighth data element in a second register, wherein the second and sixth data elements are in corresponding positions and the fourth and eighth data elements are in corresponding positions; decoding a packed instruction, the packed instruction to identify the first and the second registers as source registers and a third register as a destination register; decoding a pack instruction, the pack instruction identifying a fourth register that is to have a plurality of 16-bit data elements, a fifth register that is to have a plurality of 16-bit data elements, and enabling saturation; responsive to decoding the packed instruction, storing a result in the third register that includes only half of all data elements of the first register and only half of all data elements of the second register, the result including only corresponding data elements from corresponding data element positions in the first and second registers; and responsive to decoding the pack instruction, storing a second result that includes an 8-bit data element for each 16-bit data element in the fourth and fifth registers, wherein each 8-bit data element in the second result is to include, depending on a value of the corresponding 16-bit data element, one of a least significant 8-bits of the corresponding 16-bit data element and a saturation value. 16. The method of claim 15 , wherein the first, second, and third registers are to hold 32-bits, and wherein each of the first through eighth data elements are to be 8 bits in size. 17. The method of claim 15 , wherein the first, second, and third registers are to hold 64-bits, and wherein
according to data content, e.g. floating-point registers, address registers · CPC title
Saturation, i.e. clipping the result to a minimum or maximum value · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.