Techniques for memory error isolation

US12517798B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12517798-B2
Application numberUS-202016825276-A
CountryUS
Kind codeB2
Filing dateMar 20, 2020
Priority dateMar 20, 2020
Publication dateJan 6, 2026
Grant dateJan 6, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to isolate that memory error from other partitions.

First claim

Opening claim text (preview).

What is claimed is: 1 . One or more processors, comprising: circuitry to: cause one or more graphics processing unit (GPU) slices to be associated with a second one or more storage locations based, at least in part, on whether an error is detected within a first one or more storage locations; and facilitate migration of the one or more GPU slices based, at least in part, on whether the error is detected within the first one or more storage locations. 2 . The one or more processors of claim 1 , wherein the one or more GPU slices are to be associated with a third one or more storage locations if an error is detected within the second one or more storage locations. 3 . The one or more processors of claim 1 , wherein the first one or more storage locations are indicated to contain the error if the error is detected within the first one or more storage locations. 4 . The one or more processors of claim 1 , wherein the one or more GPU slices are physically isolated computing slices of a GPU. 5 . The one or more processors of claim 1 , wherein if the error is detected within the first one or more storage locations, data values indicating the error are to be set in the first one or more storage locations. 6 . The one or more processors of claim 1 , wherein if the error is detected within the first one or more storage locations, the one or more GPU slices are prevented from accessing the first one or more storage locations containing the error. 7 . The one or more processors of claim 1 , wherein if the error is detected, the one or more GPU slices are to be attributed with the error. 8 . The processer one or more processors of claim 7 , wherein one or more software programs being performed by the one or more GPU slices are to be indicated as being performed by the one or more GPU slices attributed with the error. 9 . The one or more processors of claim 1 , wherein the circuitry is to cause the one or more GPU slices to start communicating with the second one or more storage locations. 10 . The one or more processors of claim 1 , wherein the one or more GPU slices are to collect error information and report said error information to one or more software programs being performed by the one or more GPU slices, the one or more software programs to be used to enforce error isolation policies for the one or more GPU slices. 11 . The one or more processors of claim 1 , wherein each of the one or more GPU slices comprise an individual interrupt table and interrupts generated by each of the one or more GPU slices are to be ignored by others of the one or more GPU slices. 12 . A system comprising: one or more processors to cause one or more graphics processing unit (GPU) slices to be associated with a second one or more storage locations based, at least in part, on whether an error is detected within a first one or more storage locations; and facilitate migration of the one or more GPU slices based, at least in part, on whether the error is detected within the first one or more storage locations. 13 . The system of claim 12 , wherein the one or more GPU slices are to be associated with a third one or more storage locations if an error is detected within the second one or more storage locations. 14 . The system of claim 12 , wherein the first one or more storage locations are indicated to contain the error if the error is detected within the first one or more storage locations. 15 . The system of claim 12 , wherein if the error is detected within the first one or more storage locations, the one or more GPU slices are to be prevented from accessing the first one or more storage locations containing the error. 16 . The system of claim 12 , wherein if the error is detected within the first one or more storage locations, data values indicating the error are set in the first one or more storage locations. 17 . The system of claim 12 , wherein if the error is detected, one or more software programs being performed by the one or more GPU slices are to be attributed with the error. 18 . The system of claim 12 , wherein the one or more processors are graphics processing units. 19 . The system of claim 12 , wherein the migration of the one or more GPU slices is facilitated, by an administrator of the one or more processors, from a first processor of the one or more processors to a second processor of the one or more processors based, at least in part, on whether the error is detected on the first processor. 20 . The system of claim 12 , wherein each of the one or more GPU slices on each of the one or more processors comprise an individual interrupt table, and interrupts generated by each of the one or more GPU slices are to be ignored by other GPU slices of the one or more GPU slices. 21 . A non-transitory machine-readable medium having stored thereon instructions which, if performed by one or more processors, cause the one or more processors to at least: cause one or more graphics processing unit (GPU) slices to be associated with a second one or more storage locations based, at least in part, on whether an error is detected within a first one or more storage locations; and facilitate migration of the one or more GPU slices based, at least in part, on whether the error is detected within the first one or more storage locations. 22 . The non-transitory machine-readable medium of claim 21 , wherein the one or more GPU slices are to be associated with a third one or more storage locations if an error is detected within the second one or more storage locations. 23 . The non-transitory machine-readable medium of claim 21 , wherein the first one or more storage locations are indicated to contain the error if the error is detected within the first one or more storage locations. 24 . The non-transitory machine-readable medium of claim 21 , wherein if the error is detected within the first one or more storage locations, the one or more GPU slices are to be prevented from accessing the first one or more storage locations containing the error by, at least in part, notifying a memory management unit that the first one or more storage locations are blacklisted. 25 . The non-transitory machine-readable medium of claim 21 , wherein if the error is detected within the first one or more storage locations, the instructions, if performed, are to cause the one or more processors to modify data values in the first one or more storage locations containing the error to indicate that the first one or more storage locations contain an error. 26 . The non-transitory machine-readable medium of claim 21 , wherein if the error is detected, one or more software programs being performed by the one or more GPU slices are to be indicated as being performed by the one or more GPU slices containing the error. 27 . The non-transitory machine-readable medium of claim 21 , wherein the instructions, if performed, further cause the one or more processors to cause the one or more GPU slices to collect information about the error and to provide the information to one or more software programs being performed by the one or more GPU slices, the one or more software programs providing the information to an administrator. 28 . The non-transitory machine-readable medium of claim 21 , wherein the instructions, if performed, further cause the one or more processors to cause the one or

Assignees

Inventors

Classifications

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Solving problems relating to consistency · CPC title

  • Isolation or security of virtual machine instances · CPC title

  • I/O management, e.g. providing access to device drivers or storage · CPC title

  • Data logging (G06F11/14, G06F11/2205 take precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12517798B2 cover?
Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to i…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/2094. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).