What technology area does this patent fall under?

Primary CPC classification G06F9/3887. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Masked multi-lane instruction memory fault handling using fast and slow execution paths

US11847463B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11847463-B2
Application number	US-201916585973-A
Country	US
Kind code	B2
Filing date	Sep 27, 2019
Priority date	Sep 27, 2019
Publication date	Dec 19, 2023
Grant date	Dec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: fetching, at a processor, an instruction that represents a single-instruction-multiple data (SIMD) operation and references a memory block storing operand data for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation; attempting execution of the instruction at the processor in a first execution mode in which a single load operation is attempted to access the memory block, wherein attempting to execute the instruction at the processor in the first execution mode includes decoding the instruction into the single load operation and the SIMD operation, and attempting to generate a source memory address for the single load operation, wherein the single load operation ignores the mask vector; and responsive to a memory fault resulting from the attempt to generate the source memory address, re-executing the instruction at the processor in a second execution mode in which a separate load operation is performed, based on the mask vector, to load the operand data from the memory block for each enabled lane of the plurality of lanes prior to executing the SIMD operation. 2. The method of claim 1 , further comprising: fetching, at the processor, a second instruction that represents a second SIMD operation and references a second memory block storing operand data for each lane of the plurality of lanes and further references a second mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the second SIMD operation; attempting execution of the second instruction at the processor in the first execution mode in which a second single load operation is attempted to access the second memory block, wherein attempting to execute the second instruction at the processor in the first execution mode includes decoding the second instruction into the second single load operation and the second SIMD operation, and attempting to generate a second source memory address for the second single load operation, wherein the second single load operation ignores the mask vector; and responsive to an absence of a memory fault from the attempt to generate the second source memory address: completing the second single load operation to load the operand data of the second memory block; and performing the second SIMD operation for each enabled lane of the plurality of lanes in parallel using the loaded operand data of the second memory block. 3. The method of claim 2 , wherein the memory block comprises the second memory block. 4. The method of claim 1 , wherein re-executing the instruction at the processor in the second execution mode comprises: implementing a resynchronization for the instruction; decoding the instruction into a microcode preamble and the SIMD operation, wherein: the microcode preamble includes a load operation for each enabled lane of the plurality of lanes, the load operation configured to load the operand data for a corresponding lane from the memory block to a corresponding position in a temporary storage location; and the SIMD operation is configured to reference the temporary storage location in place of a memory location originally identified in the instruction as a source address of the memory block; performing each load operation to load the operand data for each enabled lane into the temporary storage location; and performing the SIMD operation using the operand data from the temporary storage location. 5. The method of claim 4 , wherein the memory fault comprises a page fault responsive to the memory block including a page that is not resident in memory. 6. A processor, comprising: a load/store unit; and an execution pipeline configured to execute an instruction representing a single-instruction-multiple-data (SIMD) operation in a first execution mode unless a memory fault is generated, wherein the execution pipeline is configured to attempt to execute the instruction in the first execution mode by decoding the instruction into a single load operation and the SIMD operation and attempting to generate a source memory address for the single load operation, and in response to the memory fault to re-execute the instruction in a second execution mode, wherein: the instruction references a memory block storing operand data for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation; in the first execution mode the single load operation is attempted to access the memory block via the load/store unit while ignoring the mask vector; and in the second execution mode a separate load operation is attempted, based on the mask vector, to access the memory block and is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation. 7. The processor of claim 6 , wherein: responsive to an absence of a memory fault during execution of the instruction in the first execution mode: the load/store unit is configured to complete the single load operation to load the operand data; and an execution unit of the execution pipeline is configured to perform the SIMD operation for each enabled lane of the plurality of lanes in parallel and using the loaded operand data. 8. The processor of claim 6 , wherein the execution pipeline is configured to re-execute the instruction at the processor in the second execution mode by: implementing a resynchronization for the instruction; decoding the instruction into a microcode preamble and the SIMD operation, wherein: the microcode preamble includes a load operation for each enabled lane of the plurality of lanes, the load operation configured to load the operand data for a corresponding lane from the memory block to a corresponding position in a temporary storage location; and the SIMD operation is configured to reference the temporary storage location in place of a memory location originally identified in the instruction as a source address of the memory block; directing the load/store unit to perform each load operation to load the operand data for each enabled lane into the temporary storage location; and performing the SIMD operation using the operand data from the temporary storage location. 9. The processor of claim 8 , wherein the temporary storage location is in a scratchpad memory of the processor. 10. The processor of claim 8 , wherein the SIMD operation is an arithmetic operation. 11. A method, comprising: fetching, at a processor, an instruction that represents a single-instruction-multiple-data (SIMD) operation and references a memory block that is to serve as a destination for result data generated by execution of the SIMD operation for each lane of a plurality of lanes and further references a mask vector indicating whether each lane of the plurality of lanes is enabled or disabled for the SIMD operation; attempting to execute the instruction at the processor in a first execution mode in which a single store operation is attempted to store result data to the memory block, wherein attempting to execute the instruction in the first execution mode includes decoding the instruction into the SIMD operation and the single store operation, and attempting to generate a destination address for the single store operation while ignoring the mask vector; and responsive to a memory fault resulting from the attempt to generate the destination address, re-executing the instruction at the processor in a second execution mode in which a separate store operation is performed, based on the mask ve

Assignees

Advanced Micro Devices Inc

Inventors

Classifications

G06F9/30038
using a mask · CPC title
G06F9/3887Primary
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
G06F9/30036
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
G06F9/3861Primary
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
G06F9/30043
LOAD or STORE instructions; Clear instruction · CPC title

Patent family

Related publications grouped by family.

View patent family 75163376

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11847463B2 cover?: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution p…
Who is the assignee on this patent?: Advanced Micro Devices Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/3887. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Padded vectorization with compile time known masks

Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

Instructions and Logic for Lane-Based Strided Store Operations

Microprocessor with ALU integrated into load unit

Frequently asked questions