Apparatus and method for loop flattening and reduction in a single instruction multiple data (SIMD) pipeline

US10409601B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10409601-B2
Application numberUS-201715859046-A
CountryUS
Kind codeB2
Filing dateDec 29, 2017
Priority dateDec 29, 2017
Publication dateSep 10, 2019
Grant dateSep 10, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions. For example, one embodiment of a processor comprises: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; a first source register associated with the first source operand to store a first plurality of packed data elements; a second source register associated with the second source operand to store a second plurality of packed data elements; execution circuitry to execute the operations of the decoded broadcast instruction, the execution circuitry to copy a first number of contiguous data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, the execution circuitry to further copy a second number of contiguous data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the execution circuitry is to determine the first number and the second number in accordance with the split value associated with the broadcast instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; a first source register associated with the first source operand to store a first plurality of packed data elements; a second source register associated with the second source operand to store a second plurality of packed data elements; execution circuitry to execute the operations of the decoded broadcast instruction, the execution circuitry to copy a first one or more contiguous data elements of the first plurality of packed data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, the execution circuitry to further copy a second number of one or more contiguous data elements of the second plurality of packed data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the execution circuitry is configured to determine the first set of contiguous data element locations and the second set of contiguous data element locations in accordance with the split value associated with the broadcast instruction. 2. The processor of claim 1 wherein the split value is to be included in an immediate of the broadcast instruction. 3. The processor of claim 1 wherein the split value is to be included as an operand of the broadcast instruction and stored in a third source register. 4. The processor of claim 1 wherein the split value comprises an integer value having a range of 0 to N−1, where N comprises a number of data elements in the destination register. 5. The processor of claim 4 wherein the execution circuitry is to further use the split value to identify a location at which to stop reading from the first set of data elements and start reading from the second set of data elements. 6. The processor of claim 1 wherein the broadcast instruction is to identify a mask field, the execution circuitry configured to use the mask field to determine whether to write a mask value to a data element location in the first set of contiguous data element locations or second set of contiguous data element locations of the destination register instead of one of the first or second plurality of packed data elements from the first or second registers, respectively. 7. The processor of claim 6 wherein the mask field is to be included in an immediate of the broadcast instruction or stored in a third source register. 8. The processor of claim 6 wherein the mask value comprises a packed data element comprising all zeroes or a packed mask value stored in a mask source register. 9. A method comprising: decoding a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; storing a first plurality of packed data elements in a first source register associated with the first source operand; storing a second plurality of packed data elements in a second source register associated with the second source operand; execute the plurality of operations of the decoded broadcast instruction including: copying a first one or more contiguous data elements of the first plurality of packed data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, and copying a second one or more contiguous data elements of the second plurality of packed data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the first set of contiguous data element locations and the second set of contiguous data element locations are determined in accordance with the split value associated with the broadcast instruction. 10. The method of claim 9 wherein the split value is to be included in an immediate of the broadcast instruction. 11. The method of claim 9 wherein the split value is to be included as an operand of the broadcast instruction and stored in a third source register. 12. The method of claim 9 wherein the split value comprises an integer value having a range of 0 to N−1, where N comprises a number of data elements in the destination register. 13. The method of claim 12 wherein the split value is to be used to identify a location at which to stop reading from the first set of data elements and start reading from the second set of data elements. 14. The method of claim 9 wherein the broadcast instruction is to identify a mask field, the mask field to be used to determine whether to write a mask value to a data element location in the first set of contiguous data element locations or second set of contiguous data element locations of the destination register instead of one of the first or second plurality of packed data elements from the first or second registers, respectively. 15. The method of claim 14 wherein the mask field is to be included in an immediate of the broadcast instruction or stored in a third source register. 16. The method of claim 14 wherein the mask value comprises a packed data element comprising all zeroes or a packed mask value stored in a mask source register. 17. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: decoding a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; storing a first plurality of packed data elements in a first source register associated with the first source operand; storing a second plurality of packed data elements in a second source register associated with the second source operand; execute the plurality of operations of the decoded broadcast instruction including: copying a first one or more contiguous data elements of the first plurality of packed data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, and copying a second one or more contiguous data elements of the second plurality of packed data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the first set of contiguous data element locations and the second set of contiguous data element locations are determined in accordance with the split value associated with the broadcast instruction. 18. The non-transitory machine-readable medium of claim 17 wherein the split value is to be included in an immediate of the broadcast instruction. 19. The non-transitory machine-readable medium of claim 17 wherein the split value is to be included as an operand of the broadcast instruction and stored in a third source register. 20. The non-transitory machine-readable medium of claim 17 wherein the split valu

Assignees

Inventors

Classifications

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Bit or string instructions · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Register arrangements · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10409601B2 cover?
An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions. For example, one embodiment of a processor comprises: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30032. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).