Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9766888B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9766888-B2 |
| Application number | US-201414229811-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 28, 2014 |
| Priority date | Mar 28, 2014 |
| Publication date | Sep 19, 2017 |
| Grant date | Sep 19, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor of an aspect includes packed data registers, and a decode unit to decode an instruction. The instruction may indicate a first source packed data to include at least four data elements, indicate a second source packed data to include at least four data elements, and indicate a destination storage location. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination storage location. The result packed data may include at least four indexes that may identify corresponding data element positions in the first and second source packed data. The indexes may be stored in positions in the result packed data that are to represent a sorted order of corresponding data elements in the first and second source packed data.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a plurality of packed data registers; a decode unit to decode an instruction, the instruction to indicate a first source packed data that is to include a first set of at least four data elements, to indicate a second source packed data that is to include a second set of at least four data elements, and to indicate a destination storage location; and an execution unit coupled with the packed data registers and the decode unit, the execution unit, in response to the instruction, to store a result packed data in the destination storage location, and to store a result mask that is to have at least four mask elements, the result packed data to include at least four indexes, each of the indexes to identify a corresponding data element position in each of the first and second source packed data, wherein each mask element is to correspond to a different one of the indexes, and wherein each mask element is to indicate whether the corresponding data element position for the corresponding index is in the first source packed data or the second source packed data, and the indexes to be stored in positions in the result packed data that are to represent a sorted order of corresponding data elements in the first and second source packed data. 2. The processor of claim 1 , further comprising a mask register to store the result mask, and wherein the instruction is included in an instruction set that includes a second instruction that is able to indicate the result mask as a predicate operand to predicate a packed data operation. 3. The processor of claim 1 , wherein the execution unit, in response to the instruction, is to store a second result packed data in a second destination storage location that is to be indicated by the instruction, the second result packed data to include the corresponding data elements from the first and second source packed data that correspond to the indexes and their corresponding mask elements, the corresponding data elements to be stored in positions of the second result packed data that reflect the sorted order. 4. The processor of claim 1 , wherein the first set of at least four data elements are implicitly assumed to be in sorted order for the instruction in order for the instruction to operate correctly, and the second set of at least four data elements are implicitly assumed to be in sorted order for the instruction in order for the instruction to operate correctly. 5. The processor of claim 1 , wherein the first set of at least four data elements are not assumed to be in sorted order for the instruction, and the second set of at least four data elements are not assumed to be in sorted order for the instruction. 6. The processor of claim 1 , wherein the execution unit is to store the result packed data in which the indexes are to be stored in the positions that are to represent the sorted order of the corresponding data elements that are to include only a smallest half of all of the data elements of the first and second source packed data. 7. The processor of claim 1 , wherein the execution unit is to store the result packed data in which the indexes are to be stored in the positions that are to represent the sorted order of the corresponding data elements that are to include only a largest half of all of the data elements of the first and second source packed data. 8. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source packed data that is to include at least eight data elements which are each to have one of 32-bits and 64-bits. 9. A method in a processor comprising: receiving an instruction, the instruction indicating a first source packed data including a first set of at least four data elements, indicating a second source packed data including a second set of at least four data elements, and indicating a destination storage location; storing a result mask in a storage location of the processor in response to the instruction, the result mask having at least four mask elements; and storing result packed data in the destination storage location in response to the instruction, the result packed data including at least four indexes, each of the indexes identifying a corresponding data element position in each of the first and second source packed data, wherein each mask element corresponds to a different one of the indexes, and wherein each mask element indicates whether the corresponding data element position for the corresponding index is in the first source packed data or the second source packed data, and the indexes stored in positions in the result packed data that represent a sorted order of corresponding data elements in the first and second source packed data. 10. The method of claim 9 , wherein said receiving comprises receiving the instruction indicating the first source packed data having the first set of at least four data elements in sorted order, and wherein the first set of at least four data elements must be in the sorted order in order for the instruction to operate correctly. 11. A system to process instructions, the system comprising: an interconnect; a processor coupled with the interconnect, the processor to receive a first instruction, the first instruction to indicate a first source packed data that is to include a first set of at least four data elements, to indicate a second source packed data that is to include a second set of at least four data elements, and to indicate a destination register, the processor, in response to the first instruction, to store a result packed data in the destination register, and to store a result mask that is to have at least four mask elements, the result packed data to include at least four indexes, each of the indexes to identify a corresponding data element position in each of the first and second source packed data, wherein each mask element is to correspond to a different one of the indexes, and wherein each mask element is to indicate whether the corresponding data element position for the corresponding index is in the first source packed data or the second source packed data, and the indexes to be stored in positions in the result packed data that are to represent a sorted order of corresponding data elements in the first and second source packed data; and a dynamic random access memory (DRAM) coupled with the interconnect, the DRAM storing an algorithm to use the indexes of the result packed data to sort data. 12. A processor comprising: a plurality of packed data registers; a decode unit to decode an instruction, the instruction to indicate a first source packed data that is to include a first set of at least four data elements, to indicate a second source packed data that is to include a second set of at least four data elements, and to indicate a destination storage location; and an execution unit coupled with the packed data registers and the decode unit, the execution unit, in response to the instruction, to store a result packed data in the destination storage location, the result packed data to include at least four indexes, the indexes to identify corresponding data element positions in the first and second source packed data, and the indexes to be stored in positions in the result packed data that are to represent a sorted order of corresponding data elements in the first and second source packed data, and wherein it is to be implicit to the instruction that the first set of at least four data elements are to be sorted when in the first source packed data, and that the second set of at least four data elements are to be sorted when in the second source packed da
Compare instructions, e.g. Greater-Than, Equal-To, MINMAX · CPC title
Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers {sorting methods in general}(G06F7/36 takes precedence) · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Register arrangements · CPC title
Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.