Efficient hardware instructions for single instruction multiple data processors

US10229089B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10229089-B2
Application numberUS-201715639110-A
CountryUS
Kind codeB2
Filing dateJun 30, 2017
Priority dateDec 8, 2011
Publication dateMar 12, 2019
Grant dateMar 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor configured to fetch values of bits stored in a bit-vector that are indexed by an index-vector, comprising: a first SIMD register; and a second SIMD register; wherein the index-vector comprises a contiguous series of codes; wherein each code in the index-vector represents an index value of a bit in the bit-vector; the processor is configured to respond to a set of instructions to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: establishing a first plurality of partitions within the first SIMD register; establishing a second plurality of partitions within the second SIMD register; wherein each partition in the first plurality of partitions has a corresponding partition in the second plurality of partitions; loading a copy of the bit-vector into each partition of the first plurality of partitions; loading the second plurality of partitions with contiguous codes from the index-vector; wherein loading the second plurality of partitions with contiguous codes comprises loading each partition of the second plurality of partitions with a single code from the index-vector; performing a variable shift on the copy of the bit-vector that is loaded in each partition of the first plurality of partitions; wherein the amount of the variable shift on each copy of the bit-vector is based on the code stored in the partition, of the second plurality of partitions, that corresponds to the partition, of the first plurality of partitions, in which the copy is stored; for each partition in the first plurality of partitions, causing a bit at a particular position within the partition to be moved to an output bit-vector. 2. The processor of claim 1 , wherein the processor is further configured such that performing the variable shift on the copy of the bit-vector that is loaded in each partition of the first plurality of partitions places a targeted bit of each copy of the bit-vector in the little endian position of each partition of the first plurality of partitions. 3. The processor of claim 1 , wherein the processor is further configured such that causing the bit at the particular position within the partition to be moved to the output bit-vector involves performing a final move mask operation to gather the targeted bits in each partition. 4. A processor configured to fetch values of bits stored in a bit-vector that are indexed by an index-vector, comprising: a first SIMD register; and a second SIMD register; wherein the index-vector comprises a contiguous series of codes; wherein each code in the index-vector represents an index value of a bit in the bit-vector; wherein the processor is configured to respond to a set of instructions to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: establishing a first plurality of partitions within the first SIMD register; loading the first plurality of partitions with contiguous codes from the index-vector; wherein loading the first plurality of partitions with contiguous codes comprises loading each partition of the first plurality of partitions with a single code from the index-vector; determining a plurality of byte offsets by dividing the codes in each partition of the first plurality of partitions by 8; based on the plurality of byte offsets, loading a corresponding plurality of bytes from the bit-vector into a second plurality of partitions of the second SIMD register; wherein each partition in the first plurality of partitions has a corresponding partition in the second plurality of partitions; based on the contiguous codes loaded in the first plurality of partitions, determining a target bit position for each byte in the plurality of bytes; performing a variable shift on each byte, of the plurality of bytes, that is loaded in each partition of the second plurality of partitions; wherein the amount of the variable shift on each byte of the plurality of bytes is based on the target bit position determined for each byte in the plurality of bytes; for each partition in the second plurality of partitions, causing a bit at a particular position within the partition to be moved to an output bit-vector. 5. The processor of claim 4 , wherein the processor is further configured to respond to a set of instructions to load the corresponding plurality of bytes from the bit-vector into a second plurality of partitions of the second SIMD register by performing a gather operation on the bit vector. 6. The processor of claim 4 , wherein the processor is further configured to respond to a set of instructions to determine a target bit position for each byte in the plurality of bytes by performing a modulo by eight operation on the codes in each partition of the first plurality of partitions to obtain the target bit position for each byte in the plurality of bytes. 7. The processor of claim 4 , wherein the processor is further configured such that performing the variable shift on each byte, of the plurality of bytes, that is loaded in each partition of the second plurality of partitions, further comprises shifting byte to place a targeted bit in the low-end position of each partition of the second plurality of partitions. 8. The processor of claim 4 , wherein the processor is further configured such that causing a bit to at a particular position within the partition to be moved to an output bit-vector involves performing a final move mask operation to gather the targeted bits in each partition. 9. A processor configured to fetch values of bits stored in a bit-vector that are indexed by an index-vector, comprising: a first SIMD register; and a second SIMD register; wherein the index-vector comprises a contiguous series of codes; wherein each code in the index-vector represents an index value of a bit in the bit-vector; wherein the processor implements multiple techniques for fetching values of bits stored in a bit-vector that are indexed by an index-vector; wherein the multiple techniques include a first technique and a second technique; wherein width of the SIMD registers is N bits; wherein the number of bits in each code is K; wherein the processor is configured to respond to a set of instructions to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: determining whether 2^K<=N; responsive to determining that 2^K<=N, performing the first technique to fetching values of bits stored in the bit-vector that are indexed by the index-vector; and responsive to determining that 2^K>N, performing the second technique to fetch values of bits stored in the bit-vector that are indexed by the index-vector. 10. The processor of claim 9 , wherein the processor is configured to respond to a set of instructions to perform the first technique to fetch values of bits stored in the bit-vector that are indexed by the index-vector by: loading a copy of the bit-vector into each partition of a first plurality of partitions of the first SIMD register; loading contiguous codes from the index-vector into each partition of a second plurality of partitions of the second SIMD register; performing a variable shift on the copy of the bit-vector that is loaded in each partition of the first plurality of partitions based on the code stored in the partition, of the second plurality of partitions, that corresponds to the partition, of the first plurality of partitions, in which the copy is stored; and for each partition in the first plurality of partitions, causing a bit at a particular position within the partition to be moved to an output bit-vector. 11. The processor of claim 9 , wherein the processor is configured to re

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • using a mask · CPC title

  • G06F16/221Primary

    Column-oriented storage; Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10229089B2 cover?
A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8007. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).