System, method, and recording medium for mirroring matrices for batched cholesky decomposition on a graphic processing unit

US9984041B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9984041-B2
Application numberUS-201615199772-A
CountryUS
Kind codeB2
Filing dateJun 30, 2016
Priority dateJun 30, 2016
Publication dateMay 29, 2018
Grant dateMay 29, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, include mirroring a second problem matrix of the second problem to a first problem matrix of the first problem, combining the first problem matrix and the mirrored second problem matrix into a single problem matrix, and allocating data read to a thread and to the first problem and the second problem, respectively.

First claim

Opening claim text (preview).

What is claimed is: 1. A batched Cholesky decomposition method for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, the method comprising: mirroring a second problem matrix of the second problem to a first problem matrix of the first problem as paired matrices to accelerate the batched dense Cholesky decomposition on the GPU, the first problem matrix and the second problem matrix being symmetrical and positive definite matrices; combining the first problem matrix and the mirrored second problem matrix into a single problem matrix; and allocating data read to a thread and to the first problem and the second problem, respectively. 2. The method of claim 1 , wherein each step of the Cholesky decomposition includes elements of a global memory from the first problem and the second problem, and wherein each step is a same size. 3. The method of claim 1 , wherein the single problem matrix comprises a global memory for the first problem and the second problem. 4. The method of claim 3 , wherein the single problem matrix comprises a first problem shared memory and a second problem shared memory. 5. The method of claim 4 , wherein the first problem shared memory comprises regular intervals, and wherein the second problem shared memory is continuous. 6. The method of claim 1 , wherein the mirroring shifts the mirrored second problem matrix by N+1. 7. The method of claim 1 , wherein the mirroring shifts the mirror second problem matrix by N+1 such that the single problem matrix comprises a square matrix or a rectangular matrix. 8. The method of claim 1 , wherein the data read by Cholesky decomposition is read for the first problem and the second problem at a same time. 9. The method of claim 1 , wherein the allocating allocates the data read to a plurality of threads. 10. The method of claim 9 , wherein each thread of the plurality of threads performs a same amount of work during the Cholesky decomposition. 11. The method of claim 9 , wherein the plurality of threads are not idle. 12. The method of claim 1 , wherein each step of the Cholesky decomposition of the single problem matrix has a same number of elements. 13. The method of claim 1 , wherein, after the allocating allocates the read data to the thread for each of the first problem and the second problem, the thread updates values in the data. 14. The method of claim 1 , wherein the allocating assesses the read data such that the read data is not discarded. 15. The method of claim 1 , wherein the Cholesky decomposition only synchronizes once for the first problem and the second problem. 16. The method of claim 1 , wherein the single problem matrix comprises fixed size data length with a fixed data interval such that the allocating reads the data simultaneously for the first problem and the second problem. 17. A non-transitory computer-readable recording medium recording a batched Cholesky decomposition program for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, the program causing a computer to perform: mirroring a second problem matrix of the second problem to a first problem matrix of the first problem as paired matrices to accelerate the batched dense Cholesky decomposition on the GPU, the first problem matrix and the second problem matrix being symmetrical and positive definite matrices; combining the first problem matrix and the mirrored second problem matrix into a single problem matrix; and allocating data read to a thread and to the first problem and the second problem, respectively. 18. The non-transitory computer-readable recording medium of claim 17 , wherein each step of the Cholesky decomposition includes elements of a global memory from the first problem and the second problem, and wherein each step is a same size. 19. A batched Cholesky decomposition system for at least a first problem and a second problem on a Graphics Processing Unit (GPU), said system comprising: a processor; and a memory, the memory storing instructions to cause the processor to: mirror a second problem matrix of the second problem to a first problem matrix of the first problem as paired matrices to accelerate the batched dense Cholesky decomposition on the GPU, the first problem matrix and the second problem matrix being symmetrical and positive definite matrices; combine the first problem matrix and the mirrored second problem matrix into a single problem matrix; and allocate data read to a thread and to the first problem and the second problem, respectively.

Assignees

Inventors

Classifications

  • Simultaneous equations {, e.g. systems of linear equations} · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Memory management · CPC title

  • having at least two separately controlled shifting levels, e.g. using shifting matrices (G06F5/012 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9984041B2 cover?
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, include mirroring a second problem matrix of the second problem to a first problem matrix of the first problem, combining the first problem matrix and the mirrored second problem matrix into a single problem ma…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 29 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).