Mapping workloads to cloud infrastructure
US-2021124614-A1 · Apr 29, 2021 · US
US12517798B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12517798-B2 |
| Application number | US-202016825276-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 20, 2020 |
| Priority date | Mar 20, 2020 |
| Publication date | Jan 6, 2026 |
| Grant date | Jan 6, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to isolate that memory error from other partitions.
Opening claim text (preview).
What is claimed is: 1 . One or more processors, comprising: circuitry to: cause one or more graphics processing unit (GPU) slices to be associated with a second one or more storage locations based, at least in part, on whether an error is detected within a first one or more storage locations; and facilitate migration of the one or more GPU slices based, at least in part, on whether the error is detected within the first one or more storage locations. 2 . The one or more processors of claim 1 , wherein the one or more GPU slices are to be associated with a third one or more storage locations if an error is detected within the second one or more storage locations. 3 . The one or more processors of claim 1 , wherein the first one or more storage locations are indicated to contain the error if the error is detected within the first one or more storage locations. 4 . The one or more processors of claim 1 , wherein the one or more GPU slices are physically isolated computing slices of a GPU. 5 . The one or more processors of claim 1 , wherein if the error is detected within the first one or more storage locations, data values indicating the error are to be set in the first one or more storage locations. 6 . The one or more processors of claim 1 , wherein if the error is detected within the first one or more storage locations, the one or more GPU slices are prevented from accessing the first one or more storage locations containing the error. 7 . The one or more processors of claim 1 , wherein if the error is detected, the one or more GPU slices are to be attributed with the error. 8 . The processer one or more processors of claim 7 , wherein one or more software programs being performed by the one or more GPU slices are to be indicated as being performed by the one or more GPU slices attributed with the error. 9 . The one or more processors of claim 1 , wherein the circuitry is to cause the one or more GPU slices to start communicating with the second one or more storage locations. 10 . The one or more processors of claim 1 , wherein the one or more GPU slices are to collect error information and report said error information to one or more software programs being performed by the one or more GPU slices, the one or more software programs to be used to enforce error isolation policies for the one or more GPU slices. 11 . The one or more processors of claim 1 , wherein each of the one or more GPU slices comprise an individual interrupt table and interrupts generated by each of the one or more GPU slices are to be ignored by others of the one or more GPU slices. 12 . A system comprising: one or more processors to cause one or more graphics processing unit (GPU) slices to be associated with a second one or more storage locations based, at least in part, on whether an error is detected within a first one or more storage locations; and facilitate migration of the one or more GPU slices based, at least in part, on whether the error is detected within the first one or more storage locations. 13 . The system of claim 12 , wherein the one or more GPU slices are to be associated with a third one or more storage locations if an error is detected within the second one or more storage locations. 14 . The system of claim 12 , wherein the first one or more storage locations are indicated to contain the error if the error is detected within the first one or more storage locations. 15 . The system of claim 12 , wherein if the error is detected within the first one or more storage locations, the one or more GPU slices are to be prevented from accessing the first one or more storage locations containing the error. 16 . The system of claim 12 , wherein if the error is detected within the first one or more storage locations, data values indicating the error are set in the first one or more storage locations. 17 . The system of claim 12 , wherein if the error is detected, one or more software programs being performed by the one or more GPU slices are to be attributed with the error. 18 . The system of claim 12 , wherein the one or more processors are graphics processing units. 19 . The system of claim 12 , wherein the migration of the one or more GPU slices is facilitated, by an administrator of the one or more processors, from a first processor of the one or more processors to a second processor of the one or more processors based, at least in part, on whether the error is detected on the first processor. 20 . The system of claim 12 , wherein each of the one or more GPU slices on each of the one or more processors comprise an individual interrupt table, and interrupts generated by each of the one or more GPU slices are to be ignored by other GPU slices of the one or more GPU slices. 21 . A non-transitory machine-readable medium having stored thereon instructions which, if performed by one or more processors, cause the one or more processors to at least: cause one or more graphics processing unit (GPU) slices to be associated with a second one or more storage locations based, at least in part, on whether an error is detected within a first one or more storage locations; and facilitate migration of the one or more GPU slices based, at least in part, on whether the error is detected within the first one or more storage locations. 22 . The non-transitory machine-readable medium of claim 21 , wherein the one or more GPU slices are to be associated with a third one or more storage locations if an error is detected within the second one or more storage locations. 23 . The non-transitory machine-readable medium of claim 21 , wherein the first one or more storage locations are indicated to contain the error if the error is detected within the first one or more storage locations. 24 . The non-transitory machine-readable medium of claim 21 , wherein if the error is detected within the first one or more storage locations, the one or more GPU slices are to be prevented from accessing the first one or more storage locations containing the error by, at least in part, notifying a memory management unit that the first one or more storage locations are blacklisted. 25 . The non-transitory machine-readable medium of claim 21 , wherein if the error is detected within the first one or more storage locations, the instructions, if performed, are to cause the one or more processors to modify data values in the first one or more storage locations containing the error to indicate that the first one or more storage locations contain an error. 26 . The non-transitory machine-readable medium of claim 21 , wherein if the error is detected, one or more software programs being performed by the one or more GPU slices are to be indicated as being performed by the one or more GPU slices containing the error. 27 . The non-transitory machine-readable medium of claim 21 , wherein the instructions, if performed, further cause the one or more processors to cause the one or more GPU slices to collect information about the error and to provide the information to one or more software programs being performed by the one or more GPU slices, the one or more software programs providing the information to an administrator. 28 . The non-transitory machine-readable medium of claim 21 , wherein the instructions, if performed, further cause the one or more processors to cause the one or
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Solving problems relating to consistency · CPC title
Isolation or security of virtual machine instances · CPC title
I/O management, e.g. providing access to device drivers or storage · CPC title
Data logging (G06F11/14, G06F11/2205 take precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.