Methods, apparatus, instructions and logic to provide vector packed histogram functionality

US2016378716A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016378716-A1
Application numberUS-201514752054-A
CountryUS
Kind codeA1
Filing dateJun 26, 2015
Priority dateJun 26, 2015
Publication dateDec 29, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare each element of the first data type, in the first register lane portion, with a range specified by the instruction. For any elements of the first register portion in said range, corresponding elements of the second data type, from the second register portion, are added into one of a plurality data fields of a destination register lane portion, selected according to the value of its corresponding element of the first data type, to generate packed weighted histograms for each destination register lane portion.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor comprising: a first register comprising a first plurality of data fields, the first plurality divided into first portions, each first portion having a second plurality of data fields to store bin values; a second register or memory storage set representing a third plurality of data fields to store magnitude values corresponding to the first plurality of data fields, the third plurality divided into said first portions; a destination register comprising a fourth plurality of data fields, the fourth plurality divided into second portions, each second portion comprising a fifth plurality of data fields, wherein each of the fifth plurality of data fields in the destination register is to accumulate a sum of the third plurality of data fields from one or more of the respective first portion that correspond to said each of the fifth plurality of data fields; a decode stage to decode a first instruction specifying a histogram operation; and one or more execution units, responsive to the decoded first instruction, to: compare the values of each of said second plurality of data fields of a first portion of the first register, to determine which correspond to said each of the fifth plurality of data fields of a corresponding second portion of the destination register; and accumulate into each of the fifth plurality of data fields a sum of the third plurality of data fields from said one or more of the respective first portion that correspond to said each of the fifth plurality of data fields respectively, according to the corresponding comparison. 2 . The processor of claim 1 , wherein the corresponding comparisons of each of said second plurality of data fields to determine which correspond to said each of the fifth plurality of data fields includes a comparison to a range. 3 . The processor of claim 2 , wherein the range is specified by the first instruction in an immediate operand. 4 . The processor of claim 2 , wherein the range is specified in an operation code of the first instruction. 5 . The processor of claim 2 , wherein the corresponding comparisons of each of said second plurality of data fields include determining, according to a value of a corresponding bin element, which correspond to said each of the fifth plurality of data fields. 6 . The processor of claim 2 , wherein the corresponding comparisons of each of said second plurality of data fields include determining, according to a least significant 2 bits of a corresponding bin element, which correspond to said each of the fifth plurality of data fields. 7 . The processor of claim 2 , wherein the corresponding comparisons of each of said second plurality of data fields include determining, according to a least significant 3 bits of a corresponding bin element, which correspond to said each of the fifth plurality of data fields. 8 . The processor of claim 1 , wherein the first instruction is a vector packed histogram instruction to compute a weighted histogram on four bins per each 128-bit lane of the destination register. 9 . The processor of claim 1 , wherein the first instruction is a vector packed histogram instruction to compute a weighted histogram on eight bins per each 128-bit lane of the destination register. 10 . The processor of claim 1 , wherein the first instruction is a vector packed histogram instruction to compute a weighted histogram on at least nine bins of the destination register. 11 . A method comprising: storing in each of a plurality of p data fields of a first vector register portion, an element of a first data type; storing in each of a plurality of p data fields of a second vector register or memory vector portion, a corresponding element of a second data type; executing, in a processor, a SIMD instruction for vector packed histogram; by comparing each element of the first data type, in said plurality of p data fields of the first vector register portion, with a range specified by the SIMD instruction; and for any elements of the first vector register portion in said range, adding a corresponding element of the second data type, from the second vector register or memory vector portion, into one of a plurality of q data fields, of a destination vector register portion, selected according to a value of the corresponding element of the first data type. 12 . The method of claim 11 , wherein the first data type is a 16-bit integer. 13 . The method of claim 11 , wherein the second data type is a 16-bit integer. 14 . The method of claim 11 , wherein the second data type is a 16-bit floating-point number. 15 . The method of claim 11 , wherein said plurality of p data fields has four data fields. 16 . The method of claim 11 , wherein said plurality of p data fields has eight data fields. 17 . The method of claim 11 , wherein said plurality of p data fields has sixteen data fields. 18 . The method of claim 11 , wherein said plurality of p data fields consists of the same number of data fields as in said plurality of q data fields. 19 . The method of claim 11 , wherein each of said plurality of p data fields consists of the same number of bits as each of said plurality of q data fields. 20 . The method of claim 11 , wherein each of said plurality of p data fields consists of at least half as many bits as each of said plurality of q data fields. 21 . The method of claim 11 , wherein said plurality of q data fields has four data fields. 22 . The method of claim 11 , wherein said plurality of q data fields has eight data fields. 23 . The method of claim 11 , wherein said plurality of q data fields comprises at least nine data fields. 24 . A processing system comprising: a memory; and a plurality of processors each processor comprising: a first register storing in each of a plurality of p data fields of a first register portion, an element of a first data type; a second register or memory storage set storing in each of a plurality of p data fields of a second register or memory storage set portion, a corresponding element of a second data type; a decode stage to decode a SIMD instruction for vector packed histogram; and one or more execution units, responsive to the decoded SIMD instruction, to: compare each element of the first data type, in said plurality of p data fields of the first register portion, with a range specified by the SIMD instruction; and for any elements of the first register portion in said range, add a corresponding element of the second data type, from said plurality of p data fields of the second register or memory storage set portion, into one of a plurality of q data fields, of a destination register portion, selected according to a value of its corresponding element of the first data type. 25 . The processing system of claim 24 , wherein each of said plurality of p data fields consists of the same number of bits as each of said plurality of q data fields. 26 . The processing system of claim 24 , wherein each of said plurality of p data fields consists of at least half as many bits as each of said plurality of q data fields. 27 . The processing system of claim 24 , wherein said plurality of p data fields has four data fields. 28 . The processing system of claim 24 , wherein said plurality of p data fields has eight data fields. 29 . T

Assignees

Inventors

Classifications

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • Details on data register access · CPC title

  • Special purpose registers · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016378716A1 cover?
Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).