Information processing apparatus
US-2024385843-A1 · Nov 21, 2024 · US
US9733935B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9733935-B2 |
| Application number | US-201113976404-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2011 |
| Priority date | Dec 23, 2011 |
| Publication date | Aug 15, 2017 |
| Grant date | Aug 15, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a first register to store a first vector input operand; a second register to a store a second vector input operand; a third register to a store a third vector input operand; a fourth register to store a packed data structure containing a first scalar input operands and a second scalar input operand; a decoder to decode a single instruction, having a first field specifying the first register, a second field specifying the second register, a third field specifying the third register, and a fourth field specifying the fourth register, into a decoded single instruction; and an execution unit comprising a multiplier coupled to the first register, the second register, the third register, and the fourth register, the execution unit to execute the decoded single instruction to for each element position, multiply the first scalar input operand with an element of the first vector input operand to produce a first value, multiply the second scalar input operand with a corresponding element of the second vector input operand to produce a second value, and add the first value, the second value, and a corresponding element of the third vector input operand to produce a result, and store in parallel a result for each element position of the first vector input operand, the second vector input operand, and the third vector input operand into a corresponding element position of a resultant register. 2. The processor of claim 1 , wherein said multiplier has a first input to receive the first vector input operand, a second input to receive the first scalar input operand, a third input to receive the second vector input operand, and a fourth input to receive the second scalar input operand such that the first values and the second values are calculated substantially simultaneously. 3. The processor of claim 1 , wherein said execution unit includes microcode to loop through said multiplier twice, a first loop to calculate the first values and a second loop to calculate the second values. 4. The processor of claim 1 , wherein said single instruction separately identifies a sign for each one of said first values, second values, and elements of the third vector input operand. 5. The processor of claim 4 , wherein said signs are provided in an immediate operand of the single instruction. 6. The processor of claim 1 , wherein individual locations of the first scalar input operand and the second scalar input operand within said packed data structure are determined from information placed in an immediate operand of the single instruction. 7. The processor of claim 1 , wherein the execution unit is to execute the decoded single instruction to further apply a write mask to the resultant register, and an instruction format of the single instruction includes a field to indicate the write mask. 8. A method, comprising: storing a first vector input operand in a first register; storing a second vector input operand in a second register; storing a third vector input operand in a third register; storing a packed data structure containing a first scalar input operand and a second scalar input operand in a fourth register; decoding a single instruction, having a first field specifying the first register, a second field specifying the second register, a third field specifying the third register, and a fourth field specifying the fourth register, into a decoded single instruction with a decoder of a processor; and executing the decoded single instruction with an execution unit of the processor to, for each element position, multiply the first scalar input operand with an element of the first vector input operand to produce a first value, multiply the second scalar input operand with a corresponding element of the second vector input operand to produce a second value, and add the first value, the second value, and a corresponding element of the third vector input operand to produce a result, and store in parallel a result for each element position of the first vector input operand, the second vector input operand, and the third vector input operand into a corresponding element position of a resultant register. 9. The method of claim 8 , wherein the executing comprises calculating the first values and the second values substantially simultaneously. 10. The method of claim 8 , wherein the executing comprises calculating the first values in a first microcode loop and then calculating the second values in a second microcode loop. 11. The method of claim 8 , wherein the executing comprises applying a write mask to the resultant register, and an instruction format of the single instruction includes a field to indicate the write mask. 12. The method of claim 8 , wherein said single instruction provides in an immediate value information sufficient to individually extract each of the first scalar input operand and the second scalar input operand from said packed data structure. 13. The method of claim 8 , wherein said single instruction comprises an instruction format with a fifth field that specifies the resultant register. 14. The method of claim 8 , wherein the execution unit is to not loop through a multiplier a plurality of times when executing the single instruction. 15. A non-transitory machine readable medium that stores code that when executed by a machine causes the machine to perform a method comprising: storing a first vector input operand in a first register; storing a second vector input operand in a second register; storing a third vector input operand in a third register; storing a packed data structure containing a first scalar input operand and a second scalar input operand in a fourth register; decoding a single instruction, having a first field specifying the first register, a second field specifying the second register, a third field specifying the third register, and a fourth field specifying the fourth register, into a decoded single instruction with a decoder of a processor; and executing the decoded single instruction with an execution unit of the processor to, for each element position, multiply the first scalar input operand with an element of the first vector input operand to produce a first value, multiply the second scalar input operand with a corresponding element of the second vector input operand to produce a second value, and add the first value, the second value, and a corresponding element of the third vector input operand to produce a result, and store in parallel a result for each element position of the first vector input operand, the second vector input operand, and the third vector input operand into a corresponding element position of a resultant register. 16. The non-transitory machine readable medium of claim 15 , wherein the executing comprises calculating the first values and the second values substantially simultaneously. 17. The non-transitory machine readable medium of claim 15 , wherein the executing comprises calculating the first values in a first microcode loop and then calculating the second values in a second microcode loop. 18. The non-transitory machine readable medium of claim 15 , wherein the executing comprises applying a write mask to the resultant register, and an instruction format of the single instruction includes a field to indicate the write mask. 19. The non-transitory machine readable medium of claim 15 , wherein the single instruction provides in an immediate value information sufficient to individually extract each of the first scalar input operand and th
Bit or string instructions · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Arithmetic instructions · CPC title
Special purpose registers · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.