Shifter implemented circulant permutation matrix operations
US-2024386072-A1 · Nov 21, 2024 · US
US9984041B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9984041-B2 |
| Application number | US-201615199772-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2016 |
| Priority date | Jun 30, 2016 |
| Publication date | May 29, 2018 |
| Grant date | May 29, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, include mirroring a second problem matrix of the second problem to a first problem matrix of the first problem, combining the first problem matrix and the mirrored second problem matrix into a single problem matrix, and allocating data read to a thread and to the first problem and the second problem, respectively.
Opening claim text (preview).
What is claimed is: 1. A batched Cholesky decomposition method for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, the method comprising: mirroring a second problem matrix of the second problem to a first problem matrix of the first problem as paired matrices to accelerate the batched dense Cholesky decomposition on the GPU, the first problem matrix and the second problem matrix being symmetrical and positive definite matrices; combining the first problem matrix and the mirrored second problem matrix into a single problem matrix; and allocating data read to a thread and to the first problem and the second problem, respectively. 2. The method of claim 1 , wherein each step of the Cholesky decomposition includes elements of a global memory from the first problem and the second problem, and wherein each step is a same size. 3. The method of claim 1 , wherein the single problem matrix comprises a global memory for the first problem and the second problem. 4. The method of claim 3 , wherein the single problem matrix comprises a first problem shared memory and a second problem shared memory. 5. The method of claim 4 , wherein the first problem shared memory comprises regular intervals, and wherein the second problem shared memory is continuous. 6. The method of claim 1 , wherein the mirroring shifts the mirrored second problem matrix by N+1. 7. The method of claim 1 , wherein the mirroring shifts the mirror second problem matrix by N+1 such that the single problem matrix comprises a square matrix or a rectangular matrix. 8. The method of claim 1 , wherein the data read by Cholesky decomposition is read for the first problem and the second problem at a same time. 9. The method of claim 1 , wherein the allocating allocates the data read to a plurality of threads. 10. The method of claim 9 , wherein each thread of the plurality of threads performs a same amount of work during the Cholesky decomposition. 11. The method of claim 9 , wherein the plurality of threads are not idle. 12. The method of claim 1 , wherein each step of the Cholesky decomposition of the single problem matrix has a same number of elements. 13. The method of claim 1 , wherein, after the allocating allocates the read data to the thread for each of the first problem and the second problem, the thread updates values in the data. 14. The method of claim 1 , wherein the allocating assesses the read data such that the read data is not discarded. 15. The method of claim 1 , wherein the Cholesky decomposition only synchronizes once for the first problem and the second problem. 16. The method of claim 1 , wherein the single problem matrix comprises fixed size data length with a fixed data interval such that the allocating reads the data simultaneously for the first problem and the second problem. 17. A non-transitory computer-readable recording medium recording a batched Cholesky decomposition program for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, the program causing a computer to perform: mirroring a second problem matrix of the second problem to a first problem matrix of the first problem as paired matrices to accelerate the batched dense Cholesky decomposition on the GPU, the first problem matrix and the second problem matrix being symmetrical and positive definite matrices; combining the first problem matrix and the mirrored second problem matrix into a single problem matrix; and allocating data read to a thread and to the first problem and the second problem, respectively. 18. The non-transitory computer-readable recording medium of claim 17 , wherein each step of the Cholesky decomposition includes elements of a global memory from the first problem and the second problem, and wherein each step is a same size. 19. A batched Cholesky decomposition system for at least a first problem and a second problem on a Graphics Processing Unit (GPU), said system comprising: a processor; and a memory, the memory storing instructions to cause the processor to: mirror a second problem matrix of the second problem to a first problem matrix of the first problem as paired matrices to accelerate the batched dense Cholesky decomposition on the GPU, the first problem matrix and the second problem matrix being symmetrical and positive definite matrices; combine the first problem matrix and the mirrored second problem matrix into a single problem matrix; and allocate data read to a thread and to the first problem and the second problem, respectively.
Simultaneous equations {, e.g. systems of linear equations} · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Memory management · CPC title
having at least two separately controlled shifting levels, e.g. using shifting matrices (G06F5/012 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.