Streaming engine with separately selectable element and group duplication
US-11860790-B2 · Jan 2, 2024 · US
US10055225B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10055225-B2 |
| Application number | US-201113995433-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2011 |
| Priority date | Dec 23, 2011 |
| Publication date | Aug 21, 2018 |
| Grant date | Aug 21, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor fetches a multi-register scatter instruction that includes a source operand and a destination operand. The source operand specifies a source vector register that includes multiple source data elements. The destination operand identifies multiple destination data elements that each specify a destination vector register and an index into that destination vector register. The instruction is decoded and executed, causing, for each of those identified destination data elements, the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element to be stored in the destination vector register at the index specified by that destination data element.
Opening claim text (preview).
What is claimed is: 1. A method of performing an instruction in a computer processor, comprising: fetching the instruction that includes a source operand and a destination operand, wherein the source operand specifies a source vector register included in the computer processor that includes a plurality of source data elements that are to be scattered to a plurality of destination vector registers included in the computer processor, wherein the destination operand identifies a first plurality of destination data elements, wherein each of the destination data elements specifies a destination vector register included in the computer processor out of the plurality of destination vector registers included in the computer processor and an index position into that destination vector register included in the computer processor; decoding the fetched instruction; and executing the decoded instruction causing, for each of the first plurality of destination data elements, storage of the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element in the destination vector register included in the computer processor at the index position specified by that destination data element. 2. The method of claim 1 , wherein the destination operand specifies a vector register that identifies the first plurality of destination data elements. 3. The method of claim 2 , wherein the vector register specified by the destination operand includes a second plurality of destination data elements that includes the first plurality of destination data elements and at least one destination data element that indicates that the one of the source data elements that is in a position in the source vector register that corresponds with that destination data element is not to be stored in a destination vector register as a result of executing the decoded multi-register scatter instruction. 4. The method of claim 2 , wherein the source vector register and the vector register specified by the destination operand are each 512-bits. 5. The method of claim 4 , wherein each of the first plurality of destination data elements is 32-bits of which 8 bits indicate a destination vector register and 8 bits indicate an index into that destination vector register. 6. The method of claim 1 , wherein the destination operand specifies a memory location that identifies the first plurality of destination data elements. 7. The method of claim 1 , wherein the source vector register is 512-bits. 8. A processor core, comprising: a hardware decode unit to decode an instruction, wherein the instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes a plurality of source data elements that are to be scattered to a plurality of destination vector registers, wherein the destination operand identifies a first plurality of destination data elements, wherein each of the destination data elements specifies a destination vector register out of the plurality of destination vector registers and an index into that destination vector register; and an execution engine unit to execute the decoded instruction which causes, for each of the first plurality of destination data elements, to store the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element to be stored in the destination vector register at the index specified by that destination data element. 9. The processor core of claim 8 , wherein the destination operand specifies a vector register that identifies the first plurality of destination data elements. 10. The processor core of claim 9 , wherein the vector register specified by the destination operand includes a second plurality of destination data elements that includes the first plurality of destination data elements and at least one destination data element that indicates that the one of the source data elements that is in a position in the source vector register that corresponds with that destination data element is not to be stored in a destination vector register as a result of executing the decoded multi-register scatter instruction. 11. The processor core of claim 9 , wherein the source vector register and the vector register specified by the destination operand are each 512-bits. 12. The processor core of claim 11 , wherein each of the first plurality of destination data elements is 32-bits of which 8 bits indicate a destination vector register and 8 bits indicate an index into that destination vector register. 13. The processor core of claim 8 , wherein the destination operand specifies a memory location that identifies the first plurality of destination data elements. 14. The processor core of claim 8 , wherein the source vector register is 512-bits. 15. An article of manufacture, comprising: a non-transitory tangible machine-readable storage medium having stored thereon an instruction, wherein the includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes a plurality of source data elements that are to be scattered to a plurality of destination vector registers, wherein the destination operand identifies a first plurality of destination data elements, wherein each of the destination data elements specifies a destination vector register out of the plurality of destination vector registers and an index into that destination vector register; and wherein the includes an opcode, which instructs a machine to execute the instruction that causes, for each of the first plurality of destination data elements, storage of the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element to be stored in the destination vector register at the index specified by that destination data element. 16. The article of manufacture of claim 15 , wherein the destination operand specifies a vector register that identifies the first plurality of destination data elements. 17. The article of manufacture of claim 16 , wherein the vector register specified by the destination operand includes a second plurality of destination data elements that includes the first plurality of destination data elements and at least one destination data element that indicates that the one of the source data elements that is in a position in the source vector register that corresponds with that destination data element is not to be stored in a destination vector register as a result of executing the decoded multi-register scatter instruction. 18. The article of manufacture of claim 16 , wherein the source vector register and the vector register specified by the destination operand are each 512-bits, and the size of the data elements is defined by the instruction. 19. The article of manufacture of claim 18 , wherein each of the first plurality of destination data elements is 32-bits of which 8 bits indicate a destination vector register and 8 bits indicate an index into that destination vector register. 20. The article of manufacture of claim 15 , wherein the destination operand specifies a memory location that identifies the first plurality of destination data elements.
Register arrangements · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
using a mask · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.