Accelerated interlane vector reduction instructions

US9588766B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9588766-B2
Application numberUS-201213630154-A
CountryUS
Kind codeB2
Filing dateSep 28, 2012
Priority dateSep 28, 2012
Publication dateMar 7, 2017
Grant dateMar 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a plurality of vector registers, wherein each vector register is divided into a plurality of lanes, and each lane stores a same number of data elements; and execution circuitry coupled to the plurality of vector registers, the execution circuitry to: receive a vector reduction instruction to reduce an array of the data elements stored in a source operand into a result in a destination operand using a reduction operator, wherein each of the source operand and the destination operand is one of the vector registers, responsive to the vector reduction instruction, apply the reduction operator to two of the data elements in each lane, reduce the two data elements into one data element, and shift one or more remaining data elements when there is at least one of the data elements remaining in each lane; wherein the execution circuitry is to convert reduction code without the vector reduction instruction into translated reduction code with the vector reduction instruction, wherein the reduction code and the translated reduction code specify a same sequence of reduction operations applied to the array of data elements across the plurality of lanes and generate a same result. 2. The apparatus of claim 1 , wherein the execution circuitry responsive to the vector reduction instruction is to insert a zero to a highest-order position in each lane. 3. The apparatus of claim 1 , wherein the reduction operator includes add, subtract or multiply. 4. The apparatus of claim 1 , wherein the execution circuitry is to apply the reduction operator to two of lowest-ordered data elements in each lane. 5. The apparatus of claim 1 , wherein the execution circuitry responsive to the vector reduction instruction is to shift each of the remaining data elements one position to the right within each lane. 6. The apparatus of claim 1 , wherein each of the data elements is a double-precision floating point number, a single-precision floating point number, or a half-precision floating point number. 7. A method comprising: receiving a vector reduction instruction to reduce an array of the data elements stored in a source operand into a result in a destination operand using a reduction operator, wherein each of the source operand and the destination operand is one of a plurality of vector registers, each vector register being divided into a plurality of lanes, and each lane storing a same number of data elements; responsive to the vector reduction instruction, applying the reduction operator to two of the data elements in each lane, reduce the two data elements into one data element; shifting one or more remaining data elements when there is at least one of the data elements remaining in each lane; and converting reduction code without the vector reduction instruction into translated reduction code with the vector reduction instruction, wherein the reduction code and the translated reduction code specify a same sequence of reduction operations applied to the array of data elements across the plurality of lanes and generate a same result. 8. The method of claim 7 , further comprising: responsive to the vector reduction instruction, inserting a zero to a highest-order position in each lane. 9. The method of claim 7 , wherein the reduction operator includes add, subtract or multiply. 10. The method of claim 7 , wherein applying the reduction operator further comprises applying the reduction operator to two of lowest-ordered data elements in each lane. 11. The method of claim 7 , wherein shifting positions further comprises shifting each of the remaining data elements one position to the right within each lane. 12. The method of claim 7 , wherein each of the data elements is a double-precision floating point number, a single-precision floating point number, or a half-precision floating point number. 13. A system comprising: memory; and a processor coupled to the memory, the processor comprising: a plurality of vector registers, wherein each vector register is divided into a plurality of lanes, and each lane stores a same number of data elements; and execution circuitry coupled to the plurality of vector registers, the execution circuitry to: receive a vector reduction instruction to reduce an array of the data elements stored in a source operand into a result in a destination operand using a reduction operator, wherein each of the source operand and the destination operand is one of the vector registers, responsive to the vector reduction instruction, apply the reduction operator to two of the data elements in each lane, reduce the two data elements into one data element, and shift one or more remaining data elements when there is at least one of the data elements remaining in each lane; wherein the execution circuitry is to convert reduction code without the vector reduction instruction into translated reduction code with the vector reduction instruction, wherein the reduction code and the translated reduction code specify a same sequence of reduction operations applied to the array of data elements across the plurality of lanes and generate a same result. 14. The system of claim 13 , wherein the execution circuitry responsive to the vector reduction instruction is to insert a zero to a highest-order position in each lane. 15. The system of claim 13 , wherein the reduction operator includes add, subtract or multiply. 16. The system of claim 13 , wherein the execution circuitry is to apply the reduction operator to two of lowest-ordered data elements in each lane. 17. The system of claim 13 , wherein the execution circuitry responsive to the vector reduction instruction is to shift each of the remaining data elements one position to the right within each lane.

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • with variable precision · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • using a mask · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9588766B2 cover?
A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the arr…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).