Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9513907B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9513907-B2 |
| Application number | US-201313960769-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 6, 2013 |
| Priority date | Aug 6, 2013 |
| Publication date | Dec 6, 2016 |
| Grant date | Dec 6, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, a plurality of bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a storage to store a first portion of a source vector that is to include a first plurality of packed data fields, wherein each of the first plurality of the packed data fields in the first portion of the source vector is to store a second plurality of bits comprising four or more bits; a destination register portion that is to correspond to the first portion of the source vector; a decode stage to decode a first instruction that is to specify a vector population count operation and a packed data field size; and one or more execution units, responsive to the decoded first instruction, to: read the second plurality of bits of each of the packed data fields in the first portion of the source vector; count occurrences of binary values equal to each of one or more predetermined binary values in the first plurality of the packed data fields in the first portion of the source vector; store one or more counts, that are each to correspond to a different one of the one or more predetermined binary values, of the counted occurrences, in the destination register portion that is to correspond to the first portion of the source vector. 2. The processor of claim 1 , wherein the first portion of the source vector is 32-bits. 3. The processor of claim 1 , wherein the first portion of the source vector is 64-bits. 4. The processor of claim 1 , wherein said storage to store the first portion of the source vector is a 32-bit register. 5. The processor of claim 1 , wherein said storage to store the first portion of the source vector is a cached memory location. 6. The processor of claim 1 , wherein said storage to store the first portion of the source vector is a 32-bit element of a vector register. 7. The processor of claim 1 , wherein said destination register portion is a 32-bit register. 8. The processor of claim 1 , wherein said destination register portion is a 32-bit portion of a 64-bit register. 9. The processor of claim 1 , wherein said destination register portion is a 32-bit element of a 128-bit vector register. 10. The processor of claim 1 , wherein said destination register portion is a 64-bit register. 11. The processor of claim 1 , wherein the second plurality of bits is 4-bits. 12. The processor of claim 1 , wherein the second plurality of bits is 8-bits. 13. The processor of claim 1 , wherein the packed data field size is 8-bits. 14. The processor of claim 1 , wherein said one or more predetermined binary values are to be specified by the first instruction through an immediate operand. 15. The processor of claim 1 , wherein said one or more predetermined binary values are to be specified by the first instruction through one or more elements in a register operand. 16. The processor of claim 1 , wherein the one or more execution units, responsive to the decoded first instruction, are to: read the second plurality of bits of each of a plurality of packed data fields in a second portion of the source vector; count occurrences of binary values equal to a second one or more predetermined binary values in the plurality of packed data fields in the second portion of the source vector; store one or more counts, that are each to correspond to a different one of the second one or more predetermined binary values, of the counted occurrences, in a second destination register portion that is to correspond to the second source vector portion. 17. The processor of claim 16 , wherein said storage to store the first portion of the source vector is also to store the second portion of the source vector as 32-bit elements of a vector register. 18. The processor of claim 16 , wherein said second destination register portion that is to correspond to the second portion of the source vector is a 32-bit element of a vector register. 19. The processor of claim 16 , wherein said second one or more predetermined binary values are to be specified by the first instruction through one or more elements in a portion of a register operand that is to correspond to the second portion of the source vector. 20. The processor of claim 16 , wherein said second one or more predetermined binary values are to be specified by the first instruction through a 32-bit element of a vector register operand that is to correspond to the second portion of the source vector. 21. The processor of claim 16 , wherein said second one or more predetermined binary values are to be specified by the first instruction through an immediate operand. 22. A method comprising: storing in each of a first portion of a plurality of n data fields of a first vector register, a second plurality of bits comprising four or more bits; executing, in a processor, a single-instruction multiple-data (SIMD) instruction for a vector population count; counting occurrences of binary values equal to each of a first one or more predetermined binary values in the first portion of the plurality of the n data fields in the first vector register; and storing one or more counts, each corresponding to a different one of the first one or more predetermined binary values, of the counted occurrences, in a portion of a destination register corresponding to the first portion of the plurality of the n data fields in the first vector register. 23. The method of claim 22 , wherein said first one or more predetermined binary values are specified by the first instruction through an immediate operand. 24. The method of claim 22 , further comprising: storing in each of a second portion of the plurality of the n data fields of the first vector register, the second plurality of bits; and for the second portion of the plurality of the n data fields in the first vector register, counting occurrences of binary values equal to each of a second one or more predetermined binary values, and storing one or more counts, each corresponding to a different one of the second one or more predetermined binary values, of the counted occurrences, in a portion of the destination register corresponding to the second portion of the plurality of the n data fields in the first vector register. 25. The method of claim 24 , wherein said portion of the destination register corresponding to the second portion of the plurality of the n data fields of the first vector register is a 32-bit element of the destination register. 26. A processing system comprising: a memory; and a plurality of processors each processor comprising: a storage to store a first portion of a source vector that is to include a first plurality of packed data fields, wherein each of the first plurality of the packed data fields in the first portion of the source vector is to store a second plurality of bits comprising four or more bits; a destination register portion that is to correspond to the first portion of the source vector; a decode stage to decode a first instruction that is to specify a vector population count operation and a packed data field size; and one or more execution units, responsive to the decoded first instruction, to: read the second plurality of bits of each of the packed data fields in the first portion of the source vector; count occurrences of binary values equal to one or more predetermined binary values in the first plurality of the packed data fields in the first portion of the source vector; store one or more counts, that are each to correspond to a differe
Compare instructions, e.g. Greater-Than, Equal-To, MINMAX · CPC title
comprising data of variable length · CPC title
having multiple operands in a single register · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
using a mask · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.