Efficient hardware instructions for single instruction multiple data processors: fast fixed-length value compression
US-2017060587-A1 · Mar 2, 2017 · US
US10229089B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10229089-B2 |
| Application number | US-201715639110-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2017 |
| Priority date | Dec 8, 2011 |
| Publication date | Mar 12, 2019 |
| Grant date | Mar 12, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.
Opening claim text (preview).
What is claimed is: 1. A processor configured to fetch values of bits stored in a bit-vector that are indexed by an index-vector, comprising: a first SIMD register; and a second SIMD register; wherein the index-vector comprises a contiguous series of codes; wherein each code in the index-vector represents an index value of a bit in the bit-vector; the processor is configured to respond to a set of instructions to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: establishing a first plurality of partitions within the first SIMD register; establishing a second plurality of partitions within the second SIMD register; wherein each partition in the first plurality of partitions has a corresponding partition in the second plurality of partitions; loading a copy of the bit-vector into each partition of the first plurality of partitions; loading the second plurality of partitions with contiguous codes from the index-vector; wherein loading the second plurality of partitions with contiguous codes comprises loading each partition of the second plurality of partitions with a single code from the index-vector; performing a variable shift on the copy of the bit-vector that is loaded in each partition of the first plurality of partitions; wherein the amount of the variable shift on each copy of the bit-vector is based on the code stored in the partition, of the second plurality of partitions, that corresponds to the partition, of the first plurality of partitions, in which the copy is stored; for each partition in the first plurality of partitions, causing a bit at a particular position within the partition to be moved to an output bit-vector. 2. The processor of claim 1 , wherein the processor is further configured such that performing the variable shift on the copy of the bit-vector that is loaded in each partition of the first plurality of partitions places a targeted bit of each copy of the bit-vector in the little endian position of each partition of the first plurality of partitions. 3. The processor of claim 1 , wherein the processor is further configured such that causing the bit at the particular position within the partition to be moved to the output bit-vector involves performing a final move mask operation to gather the targeted bits in each partition. 4. A processor configured to fetch values of bits stored in a bit-vector that are indexed by an index-vector, comprising: a first SIMD register; and a second SIMD register; wherein the index-vector comprises a contiguous series of codes; wherein each code in the index-vector represents an index value of a bit in the bit-vector; wherein the processor is configured to respond to a set of instructions to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: establishing a first plurality of partitions within the first SIMD register; loading the first plurality of partitions with contiguous codes from the index-vector; wherein loading the first plurality of partitions with contiguous codes comprises loading each partition of the first plurality of partitions with a single code from the index-vector; determining a plurality of byte offsets by dividing the codes in each partition of the first plurality of partitions by 8; based on the plurality of byte offsets, loading a corresponding plurality of bytes from the bit-vector into a second plurality of partitions of the second SIMD register; wherein each partition in the first plurality of partitions has a corresponding partition in the second plurality of partitions; based on the contiguous codes loaded in the first plurality of partitions, determining a target bit position for each byte in the plurality of bytes; performing a variable shift on each byte, of the plurality of bytes, that is loaded in each partition of the second plurality of partitions; wherein the amount of the variable shift on each byte of the plurality of bytes is based on the target bit position determined for each byte in the plurality of bytes; for each partition in the second plurality of partitions, causing a bit at a particular position within the partition to be moved to an output bit-vector. 5. The processor of claim 4 , wherein the processor is further configured to respond to a set of instructions to load the corresponding plurality of bytes from the bit-vector into a second plurality of partitions of the second SIMD register by performing a gather operation on the bit vector. 6. The processor of claim 4 , wherein the processor is further configured to respond to a set of instructions to determine a target bit position for each byte in the plurality of bytes by performing a modulo by eight operation on the codes in each partition of the first plurality of partitions to obtain the target bit position for each byte in the plurality of bytes. 7. The processor of claim 4 , wherein the processor is further configured such that performing the variable shift on each byte, of the plurality of bytes, that is loaded in each partition of the second plurality of partitions, further comprises shifting byte to place a targeted bit in the low-end position of each partition of the second plurality of partitions. 8. The processor of claim 4 , wherein the processor is further configured such that causing a bit to at a particular position within the partition to be moved to an output bit-vector involves performing a final move mask operation to gather the targeted bits in each partition. 9. A processor configured to fetch values of bits stored in a bit-vector that are indexed by an index-vector, comprising: a first SIMD register; and a second SIMD register; wherein the index-vector comprises a contiguous series of codes; wherein each code in the index-vector represents an index value of a bit in the bit-vector; wherein the processor implements multiple techniques for fetching values of bits stored in a bit-vector that are indexed by an index-vector; wherein the multiple techniques include a first technique and a second technique; wherein width of the SIMD registers is N bits; wherein the number of bits in each code is K; wherein the processor is configured to respond to a set of instructions to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: determining whether 2^K<=N; responsive to determining that 2^K<=N, performing the first technique to fetching values of bits stored in the bit-vector that are indexed by the index-vector; and responsive to determining that 2^K>N, performing the second technique to fetch values of bits stored in the bit-vector that are indexed by the index-vector. 10. The processor of claim 9 , wherein the processor is configured to respond to a set of instructions to perform the first technique to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: loading a copy of the bit-vector into each partition of a first plurality of partitions of the first SIMD register; loading contiguous codes from the index-vector into each partition of a second plurality of partitions of the second SIMD register; performing a variable shift on the copy of the bit-vector that is loaded in each partition of the first plurality of partitions based on the code stored in the partition, of the second plurality of partitions, that corresponds to the partition, of the first plurality of partitions, in which the copy is stored; and for each partition in the first plurality of partitions, causing a bit at a particular position within the partition to be moved to an output bit-vector. 11. The processor of claim 9 , wherein the processor is configured to re
Physics · mapped topic
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
single instruction multiple data [SIMD] multiprocessors · CPC title
using a mask · CPC title
Column-oriented storage; Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.