Predicated vector hazard check instruction

US9928069B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9928069-B2
Application numberUS-201314137232-A
CountryUS
Kind codeB2
Filing dateDec 20, 2013
Priority dateDec 20, 2013
Publication dateMar 27, 2018
Grant dateMar 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, the addresses of the vector memory operations are specified using a base address for each vector memory operation and a vector that is shared by both vector memory operations. In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: an execution core configured to execute an instruction having a plurality of operands including a first base address, a second base address, and a vector, wherein the plurality of operands specify addresses corresponding to a first vector memory operation and a second vector memory operation, wherein the first vector memory operation is prior to the second vector memory operation in program order in a loop, and wherein the execution core is configured to detect whether or not a dependency exists between addresses of the first vector memory operation and addresses of the second vector memory operation responsive to the plurality of operands, and wherein the execution core is configured to generate a dependency vector in response to the instruction that indicates the detected dependencies in response to executing the instruction, wherein the execution core is configured to write the dependency vector to a result register of the instruction, and wherein the execution core is configured to use the dependency vector to predicate vector instructions in the loop to ensure that the dependencies are respected while permitting available parallelism in each iteration of the loop. 2. The processor as recited in claim 1 wherein a first plurality of addresses corresponding to the first vector memory operation are determined responsive to the first base address and elements of the vector, and wherein a second plurality of addresses corresponding to the second vector memory operation are determined responsive to the second base address and elements of the vector. 3. The processor as recited in claim 2 wherein the vector comprises offsets to the first base address and the second base address. 4. The processor as recited in claim 3 wherein the execution core is configured to scale the offsets by a first data size corresponding to the first vector memory operation to determine the first plurality of addresses, and wherein the execution core is configured to scale the offsets by a second data size corresponding to the second vector memory operation to determine the second plurality of addresses. 5. The processor as recited in claim 4 wherein the execution core is configured to add the offsets scaled by the first data size to the first base address to generate the first plurality of addresses, and wherein the execution core is configured to add the offsets scaled by the second data size to the second base address to generate the second plurality of addresses. 6. The processor as recited in claim 1 wherein the instruction further includes a first predicate corresponding to the second vector memory operation, wherein the execution core is configured to detect dependencies between elements of the first vector memory operation and elements of the second vector memory operation for which corresponding elements of the first predicate are active. 7. The processor as recited in claim 6 wherein the instruction further includes a second predicate corresponding to the first vector memory operation, wherein the execution core is configured to detect dependencies between elements of the first memory vector operation for which corresponding elements of the second predicate are active and elements of the second vector memory operation for which corresponding elements of the first predicate are active. 8. The processor as recited in claim 7 wherein the execution core is configured to detect no dependency for a first element of the first memory vector operation and a second element of the second vector memory operation responsive to either the first element or the second element being inactive as indicated by the respective second predicate or first predicate. 9. The processor as recited in claim 8 wherein the execution core is configured to detect a dependency responsive to both the first element and the second element being active as indicated by the respective second predicate and first predicate and further responsive to an overlap in the addresses of the first element and the second element. 10. The processor as recited in claim 1 wherein the dependency vector indicates, for each first element of the first vector memory operation that depends on a second element of the second vector memory operation, the element number of the second vector memory operation on which the first element depends. 11. The processor as recited in claim 10 wherein the dependency vector includes an indicator of no dependency for first elements that do not depend on second elements. 12. The processor as recited in claim 11 wherein the indicator is a value of zero. 13. A method comprising: executing, in a processor, an instruction having a plurality of operands including a first base address, a second base address, and a vector, wherein the plurality of operands specify addresses corresponding to a first vector memory operation and a second vector memory operation, and wherein the first vector memory operation is prior to the second vector memory operation in program order in a loop; during the executing, the processor detecting whether or not a dependency exists between addresses of the first vector memory operation and addresses of the second vector memory operation responsive to the plurality of operands; responsive to the executing, the processor generating a dependency vector that indicates the detected dependencies in response to executing the instruction; and the processor writing the dependency vector to a result register of the instruction, wherein the processor is configured to use the dependency vector to predicate vector instructions in the loop to ensure that the dependencies are respected while permitting available parallelism in each iteration of the loop. 14. The method as recited in claim 13 wherein a first plurality of addresses corresponding to the first memory vector operation are determined responsive to the first base address and elements of the vector, and wherein a second plurality of addresses corresponding to the second vector memory operation are determined responsive to the second base address and elements of the vector. 15. The method as recited in claim 14 wherein the vector comprises offsets to the first base address and the second base address, and the executing further comprising: scaling the offsets by a first data size corresponding to the first vector memory operation to determine the first plurality of addresses; scaling the offsets by a second data size corresponding to the second vector memory operation to determine the second plurality of addresses; adding the offsets scaled by the first data size to the first base address to generate the first plurality of addresses; and adding the offsets scaled by the second data size to the second base address to generate the second plurality of addresses. 16. The method as recited in claim 13 wherein the instruction further includes a first predicate corresponding to the second vector memory operation, wherein the detecting comprises detecting dependencies between elements of the first vector memory operation and elements of the second vector memory operation for which corresponding elements of the first predicate are active. 17. The method as recited in claim 16 wherein the instruction further includes a second predicate corresponding to the first vector memory operation, wherein the detecting comprises detecting dependencies between elements of the first vector memory operation for which corresponding elements of the second predicate are active and elements of the second vector memory operation for which

Assignees

Inventors

Classifications

  • to perform miscellaneous control operations, e.g. NOP · CPC title

  • G06F9/3838Primary

    Dependency mechanisms, e.g. register scoreboarding · CPC title

  • Bit or string instructions · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9928069B2 cover?
A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends o…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/3838. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).