Padded vectorization with compile time known masks
US-2020073662-A1 · Mar 5, 2020 · US
US11847463B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11847463-B2 |
| Application number | US-201916585973-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 27, 2019 |
| Priority date | Sep 27, 2019 |
| Publication date | Dec 19, 2023 |
| Grant date | Dec 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: fetching, at a processor, an instruction that represents a single-instruction-multiple data (SIMD) operation and references a memory block storing operand data for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation; attempting execution of the instruction at the processor in a first execution mode in which a single load operation is attempted to access the memory block, wherein attempting to execute the instruction at the processor in the first execution mode includes decoding the instruction into the single load operation and the SIMD operation, and attempting to generate a source memory address for the single load operation, wherein the single load operation ignores the mask vector; and responsive to a memory fault resulting from the attempt to generate the source memory address, re-executing the instruction at the processor in a second execution mode in which a separate load operation is performed, based on the mask vector, to load the operand data from the memory block for each enabled lane of the plurality of lanes prior to executing the SIMD operation. 2. The method of claim 1 , further comprising: fetching, at the processor, a second instruction that represents a second SIMD operation and references a second memory block storing operand data for each lane of the plurality of lanes and further references a second mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the second SIMD operation; attempting execution of the second instruction at the processor in the first execution mode in which a second single load operation is attempted to access the second memory block, wherein attempting to execute the second instruction at the processor in the first execution mode includes decoding the second instruction into the second single load operation and the second SIMD operation, and attempting to generate a second source memory address for the second single load operation, wherein the second single load operation ignores the mask vector; and responsive to an absence of a memory fault from the attempt to generate the second source memory address: completing the second single load operation to load the operand data of the second memory block; and performing the second SIMD operation for each enabled lane of the plurality of lanes in parallel using the loaded operand data of the second memory block. 3. The method of claim 2 , wherein the memory block comprises the second memory block. 4. The method of claim 1 , wherein re-executing the instruction at the processor in the second execution mode comprises: implementing a resynchronization for the instruction; decoding the instruction into a microcode preamble and the SIMD operation, wherein: the microcode preamble includes a load operation for each enabled lane of the plurality of lanes, the load operation configured to load the operand data for a corresponding lane from the memory block to a corresponding position in a temporary storage location; and the SIMD operation is configured to reference the temporary storage location in place of a memory location originally identified in the instruction as a source address of the memory block; performing each load operation to load the operand data for each enabled lane into the temporary storage location; and performing the SIMD operation using the operand data from the temporary storage location. 5. The method of claim 4 , wherein the memory fault comprises a page fault responsive to the memory block including a page that is not resident in memory. 6. A processor, comprising: a load/store unit; and an execution pipeline configured to execute an instruction representing a single-instruction-multiple-data (SIMD) operation in a first execution mode unless a memory fault is generated, wherein the execution pipeline is configured to attempt to execute the instruction in the first execution mode by decoding the instruction into a single load operation and the SIMD operation and attempting to generate a source memory address for the single load operation, and in response to the memory fault to re-execute the instruction in a second execution mode, wherein: the instruction references a memory block storing operand data for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation; in the first execution mode the single load operation is attempted to access the memory block via the load/store unit while ignoring the mask vector; and in the second execution mode a separate load operation is attempted, based on the mask vector, to access the memory block and is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation. 7. The processor of claim 6 , wherein: responsive to an absence of a memory fault during execution of the instruction in the first execution mode: the load/store unit is configured to complete the single load operation to load the operand data; and an execution unit of the execution pipeline is configured to perform the SIMD operation for each enabled lane of the plurality of lanes in parallel and using the loaded operand data. 8. The processor of claim 6 , wherein the execution pipeline is configured to re-execute the instruction at the processor in the second execution mode by: implementing a resynchronization for the instruction; decoding the instruction into a microcode preamble and the SIMD operation, wherein: the microcode preamble includes a load operation for each enabled lane of the plurality of lanes, the load operation configured to load the operand data for a corresponding lane from the memory block to a corresponding position in a temporary storage location; and the SIMD operation is configured to reference the temporary storage location in place of a memory location originally identified in the instruction as a source address of the memory block; directing the load/store unit to perform each load operation to load the operand data for each enabled lane into the temporary storage location; and performing the SIMD operation using the operand data from the temporary storage location. 9. The processor of claim 8 , wherein the temporary storage location is in a scratchpad memory of the processor. 10. The processor of claim 8 , wherein the SIMD operation is an arithmetic operation. 11. A method, comprising: fetching, at a processor, an instruction that represents a single-instruction-multiple-data (SIMD) operation and references a memory block that is to serve as a destination for result data generated by execution of the SIMD operation for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation; attempting to execute the instruction at the processor in a first execution mode in which a single store operation is attempted to store result data to the memory block, wherein attempting to execute the instruction in the first execution mode includes decoding the instruction into the SIMD operation and the single store operation, and attempting to generate a destination address for the single store operation while ignoring the mask vector; and responsive to a memory fault resulting from the attempt to generate the destination address, re-executing the instruction at the processor in a second execution mode in which a separate store operation is performed, based on the mask ve
using a mask · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.