System, method, and recording medium for mirroring matrices for batched cholesky decomposition on a graphic processing unit
US-11790035-B2 · Oct 17, 2023 · US
US12086207B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12086207-B2 |
| Application number | US-202318216926-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2023 |
| Priority date | Jun 30, 2016 |
| Publication date | Sep 10, 2024 |
| Grant date | Sep 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU), include mirroring matrices to form paired matrices solving the paired matrices simultaneously.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-readable recording medium recording a program for a Graphics Processing Unit (GPU), the program causing a computer to perform a method comprising: generating a combined matrix as a rectangular matrix via merging a first problem matrix for a first problem and a second problem matrix for a second problem, the second problem matrix being folded with respect to the first problem matrix; storing a diagonal intersection portion of the combined matrix in global memory, the diagonal intersection portion occurring at an intersection of the first problem matrix and the folded second problem matrix in the combined matrix; storing a first problem portion of the combined matrix in a shared first memory; storing a second problem portion of the combined matrix in a shared second memory; and utilizing the combined matrix to accelerate batched dense Cholesky decomposition on the GPU, the utilizing comprising allocating, to a thread, to the first problem, and to the second problem, data read from the diagonal intersection portion from the global memory. 2. A batched Cholesky decomposition method for a Graphics Processing Unit (GPUJ), the method comprising: generating a combined matrix as a rectangular matrix via merging a first problem matrix for a first problem and a second problem matrix for a second problem, the second problem matrix being folded with respect to the first problem matrix; storing a diagonal intersection portion of the combined matrix in global memory, the diagonal intersection portion occurring at an intersection of the first problem matrix and the folded second problem matrix in the combined matrix; storing a first problem portion of the combined matrix in a shared first memory; storing a second problem portion of the combined matrix in a shared second memory; and utilizing the combined matrix to accelerate batched dense Cholesky decomposition on the GPU, the utilizing comprising allocating, to a thread, to the first problem, and to the second problem, data read from the diagonal intersection portion from the global memory. 3. A batched Cholesky decomposition system on a Graphics Processing Unit (GPU), said system comprising: a processor; and a memory, the memory storing instructions to cause the processor to perform computer operations comprising: generating a combined matrix as a rectangular matrix via merging a first problem matrix for a first problem and a second problem matrix for a second problem, the second problem matrix being folded with respect to the first problem matrix; storing a diagonal intersection portion of the combined matrix in global memory, the diagonal intersection portion occurring at an intersection of the first problem matrix and the folded second problem matrix in the combined matrix; storing a first problem portion of the combined matrix in a shared first memory; storing a second problem portion of the combined matrix in a shared second memory; and utilizing the combined matrix to accelerate batched dense Cholesky decomposition on the GPU, the utilizing comprising allocating, to a thread, to the first problem, and to the second problem, data read from the diagonal intersection portion from the global memory.
Indexing scheme relating to group G06F5/00; Methods or arrangements for data conversion without changing the order or content of the data handled · CPC title
Memory management · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Simultaneous equations {, e.g. systems of linear equations} · CPC title
having at least two separately controlled shifting levels, e.g. using shifting matrices (G06F5/012 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.