Methods, apparatus, instructions and logic to provide vector population count functionality

US9513907B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9513907-B2
Application numberUS-201313960769-A
CountryUS
Kind codeB2
Filing dateAug 6, 2013
Priority dateAug 6, 2013
Publication dateDec 6, 2016
Grant dateDec 6, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, a plurality of bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a storage to store a first portion of a source vector that is to include a first plurality of packed data fields, wherein each of the first plurality of the packed data fields in the first portion of the source vector is to store a second plurality of bits comprising four or more bits; a destination register portion that is to correspond to the first portion of the source vector; a decode stage to decode a first instruction that is to specify a vector population count operation and a packed data field size; and one or more execution units, responsive to the decoded first instruction, to: read the second plurality of bits of each of the packed data fields in the first portion of the source vector; count occurrences of binary values equal to each of one or more predetermined binary values in the first plurality of the packed data fields in the first portion of the source vector; store one or more counts, that are each to correspond to a different one of the one or more predetermined binary values, of the counted occurrences, in the destination register portion that is to correspond to the first portion of the source vector. 2. The processor of claim 1 , wherein the first portion of the source vector is 32-bits. 3. The processor of claim 1 , wherein the first portion of the source vector is 64-bits. 4. The processor of claim 1 , wherein said storage to store the first portion of the source vector is a 32-bit register. 5. The processor of claim 1 , wherein said storage to store the first portion of the source vector is a cached memory location. 6. The processor of claim 1 , wherein said storage to store the first portion of the source vector is a 32-bit element of a vector register. 7. The processor of claim 1 , wherein said destination register portion is a 32-bit register. 8. The processor of claim 1 , wherein said destination register portion is a 32-bit portion of a 64-bit register. 9. The processor of claim 1 , wherein said destination register portion is a 32-bit element of a 128-bit vector register. 10. The processor of claim 1 , wherein said destination register portion is a 64-bit register. 11. The processor of claim 1 , wherein the second plurality of bits is 4-bits. 12. The processor of claim 1 , wherein the second plurality of bits is 8-bits. 13. The processor of claim 1 , wherein the packed data field size is 8-bits. 14. The processor of claim 1 , wherein said one or more predetermined binary values are to be specified by the first instruction through an immediate operand. 15. The processor of claim 1 , wherein said one or more predetermined binary values are to be specified by the first instruction through one or more elements in a register operand. 16. The processor of claim 1 , wherein the one or more execution units, responsive to the decoded first instruction, are to: read the second plurality of bits of each of a plurality of packed data fields in a second portion of the source vector; count occurrences of binary values equal to a second one or more predetermined binary values in the plurality of packed data fields in the second portion of the source vector; store one or more counts, that are each to correspond to a different one of the second one or more predetermined binary values, of the counted occurrences, in a second destination register portion that is to correspond to the second source vector portion. 17. The processor of claim 16 , wherein said storage to store the first portion of the source vector is also to store the second portion of the source vector as 32-bit elements of a vector register. 18. The processor of claim 16 , wherein said second destination register portion that is to correspond to the second portion of the source vector is a 32-bit element of a vector register. 19. The processor of claim 16 , wherein said second one or more predetermined binary values are to be specified by the first instruction through one or more elements in a portion of a register operand that is to correspond to the second portion of the source vector. 20. The processor of claim 16 , wherein said second one or more predetermined binary values are to be specified by the first instruction through a 32-bit element of a vector register operand that is to correspond to the second portion of the source vector. 21. The processor of claim 16 , wherein said second one or more predetermined binary values are to be specified by the first instruction through an immediate operand. 22. A method comprising: storing in each of a first portion of a plurality of n data fields of a first vector register, a second plurality of bits comprising four or more bits; executing, in a processor, a single-instruction multiple-data (SIMD) instruction for a vector population count; counting occurrences of binary values equal to each of a first one or more predetermined binary values in the first portion of the plurality of the n data fields in the first vector register; and storing one or more counts, each corresponding to a different one of the first one or more predetermined binary values, of the counted occurrences, in a portion of a destination register corresponding to the first portion of the plurality of the n data fields in the first vector register. 23. The method of claim 22 , wherein said first one or more predetermined binary values are specified by the first instruction through an immediate operand. 24. The method of claim 22 , further comprising: storing in each of a second portion of the plurality of the n data fields of the first vector register, the second plurality of bits; and for the second portion of the plurality of the n data fields in the first vector register, counting occurrences of binary values equal to each of a second one or more predetermined binary values, and storing one or more counts, each corresponding to a different one of the second one or more predetermined binary values, of the counted occurrences, in a portion of the destination register corresponding to the second portion of the plurality of the n data fields in the first vector register. 25. The method of claim 24 , wherein said portion of the destination register corresponding to the second portion of the plurality of the n data fields of the first vector register is a 32-bit element of the destination register. 26. A processing system comprising: a memory; and a plurality of processors each processor comprising: a storage to store a first portion of a source vector that is to include a first plurality of packed data fields, wherein each of the first plurality of the packed data fields in the first portion of the source vector is to store a second plurality of bits comprising four or more bits; a destination register portion that is to correspond to the first portion of the source vector; a decode stage to decode a first instruction that is to specify a vector population count operation and a packed data field size; and one or more execution units, responsive to the decoded first instruction, to: read the second plurality of bits of each of the packed data fields in the first portion of the source vector; count occurrences of binary values equal to one or more predetermined binary values in the first plurality of the packed data fields in the first portion of the source vector; store one or more counts, that are each to correspond to a differe

Assignees

Inventors

Classifications

  • Compare instructions, e.g. Greater-Than, Equal-To, MINMAX · CPC title

  • comprising data of variable length · CPC title

  • having multiple operands in a single register · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9513907B2 cover?
Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, a plurality of bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).