Systems, apparatuses, and methods for performing conflict detection and broadcasting contents of a register to data element positions of another register
US-9665368-B2 · May 30, 2017 · US
US9934032B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9934032-B2 |
| Application number | US-201615331940-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 24, 2016 |
| Priority date | Mar 30, 2013 |
| Publication date | Apr 3, 2018 |
| Grant date | Apr 3, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes receiving a packed data instruction indicating a first narrower source packed data operand and a narrower destination operand. The instruction is mapped to a masked packed data operation indicating a first wider source packed data operand that is wider than and includes the first narrower source operand, and indicating a wider destination operand that is wider than and includes the narrower destination operand. A packed data operation mask is generated that includes a mask element for each corresponding result data element of a packed data result to be stored by the masked packed data operation. All mask elements that correspond to result data elements to be stored by the masked operation that would not be stored by the packed data instruction are masking out. The masked operation is performed using the packed data operation mask. The packed data result is stored in the wider destination operand.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a plurality of packed data operation mask registers, wherein at least one instruction of an instruction set of the processor has a field to specify one of the plurality of the packed data operation mask registers; a decode unit to decode a packed data instruction that is to indicate at least a first narrower source packed data operand and a narrower destination operand; and an execution unit coupled with the plurality of the packed data operation mask registers and coupled with the decode unit, the execution unit in response to the decode of the packed data instruction to perform a masked packed data operation that is to involve at least a first wider source packed data operand that is to be wider than the first narrower source packed data operand, that is to involve a wider destination operand that is to be wider than the narrower destination operand, and that is to involve a packed data operation mask, the packed data operation mask to be stored in a packed data operation mask register of the plurality of packed data operation mask registers, the packed data operation mask to include a mask element for each corresponding result data element of a packed data result that is to be stored by the masked packed data operation, wherein all mask elements that correspond to result data elements to be stored by the masked packed data operation that do not correspond to the packed data instruction are to be masking out, the execution unit to store the packed data result in the wider destination operand. 2. The processor of claim 1 , wherein the execution unit is to write an entire width of a register that is to correspond to the wider destination operand, and wherein the narrower destination operand is to represent only a portion of the width of the register. 3. The processor of claim 1 , wherein the decode unit is to decode the packed data instruction that is also to indicate a second narrower source packed data operand, and wherein the masked packed data operation is also to involve a second wider source packed data operand that is to be wider than the second narrower source packed data operand. 4. The processor of claim 1 , wherein the decode unit is to decode the packed data instruction that is not to indicate a packed data operation mask. 5. The processor of claim 1 , wherein the decode unit is to decode the packed data instruction that is to indicate a packed data operation mask having fewer mask elements than the packed data operation mask that is to be used by the execution unit to perform the masked packed data operation. 6. The processor of claim 1 , wherein the execution unit is to store the packed data result in which a value of each result data element that corresponds to a masked out mask element is to be unchanged, and in which a value of each result data element that corresponds to an unmasked mask element is to be updated by an operation associated with the packed data instruction. 7. The processor of claim 1 , wherein said all mask elements that correspond to the result data elements to be stored by the masked packed data operation that do not correspond to the packed data instruction are to have a value of a corresponding data element of the at least one wider source packed data operand. 8. A method in a processor comprising: receiving a packed data instruction indicating at least a first narrower source packed data operand and a narrower destination operand; mapping the packed data instruction to a masked packed data operation involving at least a first wider source packed data operand that is wider than the first narrower source packed data operand, and involving a wider destination operand that is wider than the narrower destination operand; accessing a packed data operation mask from a set of packed data operation mask registers of the processor that are capable of being specified by instructions of an instruction set of the processor, the packed data operation mask including a mask element for each corresponding result data element of a packed data result to be stored by the masked packed data operation, wherein all mask elements that correspond to result data elements to be stored by the masked packed data operation that do not correspond to the packed data instruction are masking out; performing the masked packed data operation using the packed data operation mask; and storing the packed data result in the wider destination operand. 9. The method of claim 8 , wherein storing the packed data result comprises writing an entire width of a register that corresponds to the wider destination operand, and wherein the narrower destination operand represents only a portion of the width of the register. 10. The method of claim 8 , wherein receiving comprises receiving the packed data instruction that also indicates a second narrower source packed data operand, and wherein mapping includes mapping the packed data instruction to the masked packed data operation that also involves a second wider source packed data operand that is wider than the second narrower source packed data operand. 11. The method of claim 8 , wherein receiving comprises receiving the packed data instruction that does not indicate a packed data operation mask. 12. The method of claim 8 , wherein receiving comprises receiving the packed data instruction that indicates a second packed data operation mask that has a lesser number of mask elements than the accessed packed data operation mask. 13. The method of claim 8 , wherein the first narrower source packed data operand is aliased on the first wider source packed data operand in a register. 14. The method of claim 8 , wherein storing comprises storing the packed data result in which a value of each result data element that corresponds to a masked out mask element is unchanged, and in which a value of each result data element that corresponds to an unmasked mask element is updated by an operation associated with the packed data instruction. 15. A processor comprising: a plurality of packed data operation mask registers; a decode unit to decode a packed data instruction that is to indicate at least a first narrower source packed data, and that has a field to specify one of the packed data operation mask registers as a storage location for a narrower packed data operation mask; and an execution unit coupled with the plurality of the packed data operation mask registers and coupled with the decode unit, the execution unit in response to the decode of the packed data instruction to perform a masked packed data operation that is to involve at least a first wider source packed data operand that is to be wider than the first narrower source packed data operand, and that is to involve a wider packed data operation mask that is to be wider than the narrower packed data operation mask, the wider packed data operation mask to include a mask element for each corresponding result data element of a packed data result that is to be stored by the masked packed data operation, wherein all mask elements that correspond to result data elements beyond a width of a packed data result that would be generated by the packed data instruction are to be masking out. 16. The processor of claim 15 , wherein the masked out result data elements are to have a value of a corresponding data element of the at least one wider source packed data operand.
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Decoding the operand specifier, e.g. specifier format · CPC title
using a mask · CPC title
Bit or string instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.