Hazard check instructions for enhanced predicate vector operations
US-9600280-B2 · Mar 21, 2017 · US
US9928069B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9928069-B2 |
| Application number | US-201314137232-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 20, 2013 |
| Priority date | Dec 20, 2013 |
| Publication date | Mar 27, 2018 |
| Grant date | Mar 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, the addresses of the vector memory operations are specified using a base address for each vector memory operation and a vector that is shared by both vector memory operations. In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: an execution core configured to execute an instruction having a plurality of operands including a first base address, a second base address, and a vector, wherein the plurality of operands specify addresses corresponding to a first vector memory operation and a second vector memory operation, wherein the first vector memory operation is prior to the second vector memory operation in program order in a loop, and wherein the execution core is configured to detect whether or not a dependency exists between addresses of the first vector memory operation and addresses of the second vector memory operation responsive to the plurality of operands, and wherein the execution core is configured to generate a dependency vector in response to the instruction that indicates the detected dependencies in response to executing the instruction, wherein the execution core is configured to write the dependency vector to a result register of the instruction, and wherein the execution core is configured to use the dependency vector to predicate vector instructions in the loop to ensure that the dependencies are respected while permitting available parallelism in each iteration of the loop. 2. The processor as recited in claim 1 wherein a first plurality of addresses corresponding to the first vector memory operation are determined responsive to the first base address and elements of the vector, and wherein a second plurality of addresses corresponding to the second vector memory operation are determined responsive to the second base address and elements of the vector. 3. The processor as recited in claim 2 wherein the vector comprises offsets to the first base address and the second base address. 4. The processor as recited in claim 3 wherein the execution core is configured to scale the offsets by a first data size corresponding to the first vector memory operation to determine the first plurality of addresses, and wherein the execution core is configured to scale the offsets by a second data size corresponding to the second vector memory operation to determine the second plurality of addresses. 5. The processor as recited in claim 4 wherein the execution core is configured to add the offsets scaled by the first data size to the first base address to generate the first plurality of addresses, and wherein the execution core is configured to add the offsets scaled by the second data size to the second base address to generate the second plurality of addresses. 6. The processor as recited in claim 1 wherein the instruction further includes a first predicate corresponding to the second vector memory operation, wherein the execution core is configured to detect dependencies between elements of the first vector memory operation and elements of the second vector memory operation for which corresponding elements of the first predicate are active. 7. The processor as recited in claim 6 wherein the instruction further includes a second predicate corresponding to the first vector memory operation, wherein the execution core is configured to detect dependencies between elements of the first memory vector operation for which corresponding elements of the second predicate are active and elements of the second vector memory operation for which corresponding elements of the first predicate are active. 8. The processor as recited in claim 7 wherein the execution core is configured to detect no dependency for a first element of the first memory vector operation and a second element of the second vector memory operation responsive to either the first element or the second element being inactive as indicated by the respective second predicate or first predicate. 9. The processor as recited in claim 8 wherein the execution core is configured to detect a dependency responsive to both the first element and the second element being active as indicated by the respective second predicate and first predicate and further responsive to an overlap in the addresses of the first element and the second element. 10. The processor as recited in claim 1 wherein the dependency vector indicates, for each first element of the first vector memory operation that depends on a second element of the second vector memory operation, the element number of the second vector memory operation on which the first element depends. 11. The processor as recited in claim 10 wherein the dependency vector includes an indicator of no dependency for first elements that do not depend on second elements. 12. The processor as recited in claim 11 wherein the indicator is a value of zero. 13. A method comprising: executing, in a processor, an instruction having a plurality of operands including a first base address, a second base address, and a vector, wherein the plurality of operands specify addresses corresponding to a first vector memory operation and a second vector memory operation, and wherein the first vector memory operation is prior to the second vector memory operation in program order in a loop; during the executing, the processor detecting whether or not a dependency exists between addresses of the first vector memory operation and addresses of the second vector memory operation responsive to the plurality of operands; responsive to the executing, the processor generating a dependency vector that indicates the detected dependencies in response to executing the instruction; and the processor writing the dependency vector to a result register of the instruction, wherein the processor is configured to use the dependency vector to predicate vector instructions in the loop to ensure that the dependencies are respected while permitting available parallelism in each iteration of the loop. 14. The method as recited in claim 13 wherein a first plurality of addresses corresponding to the first memory vector operation are determined responsive to the first base address and elements of the vector, and wherein a second plurality of addresses corresponding to the second vector memory operation are determined responsive to the second base address and elements of the vector. 15. The method as recited in claim 14 wherein the vector comprises offsets to the first base address and the second base address, and the executing further comprising: scaling the offsets by a first data size corresponding to the first vector memory operation to determine the first plurality of addresses; scaling the offsets by a second data size corresponding to the second vector memory operation to determine the second plurality of addresses; adding the offsets scaled by the first data size to the first base address to generate the first plurality of addresses; and adding the offsets scaled by the second data size to the second base address to generate the second plurality of addresses. 16. The method as recited in claim 13 wherein the instruction further includes a first predicate corresponding to the second vector memory operation, wherein the detecting comprises detecting dependencies between elements of the first vector memory operation and elements of the second vector memory operation for which corresponding elements of the first predicate are active. 17. The method as recited in claim 16 wherein the instruction further includes a second predicate corresponding to the first vector memory operation, wherein the detecting comprises detecting dependencies between elements of the first vector memory operation for which corresponding elements of the second predicate are active and elements of the second vector memory operation for which
to perform miscellaneous control operations, e.g. NOP · CPC title
Dependency mechanisms, e.g. register scoreboarding · CPC title
Bit or string instructions · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
using a mask · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.