Systems and methods for quantum monte carlo processing
US-2024428112-A1 · Dec 26, 2024 · US
US2016224902A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016224902-A1 |
| Application number | US-201514713205-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 15, 2015 |
| Priority date | Jun 26, 2014 |
| Publication date | Aug 4, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An efficient parallel Gibbs sampler using butterfly-patterned partial sums is provided. Instead of building and searching a complete prefix sums table, an alternative “butterfly patterned partial sums table” is described that integrates a lightweight transposition and partial sums operation. Accordingly, the usual full matrix transposition and full prefix sums table building operations can be omitted in favor of building the butterfly-patterned partial sums table, which requires less computational and communication effort. This butterfly-patterned partial sums table is used by a modified binary search phase that calculates the needed prefix-sum table values on-the-fly using the butterfly-patterned partial sums table. Transposed memory access is also provided while avoiding the full matrix transform, providing significant performance benefits for highly parallel architectures, such as graphics processing units (GPUs) where 1-stride or sequential memory accesses are important for optimization.
Opening claim text (preview).
What is claimed is: 1 . A method for parallel Gibbs sampling from a plurality of discrete probability distributions, wherein the method comprises: receiving parameters for the plurality of discrete probability distributions; initializing a partial sums table based on a matrix product of the parameters, wherein the partial sums table comprises a plurality of blocks having a maximum size W that is 2 to a power of X; applying X iterations of butterfly patterned replacements to each of the plurality of blocks such that a last row of each of the plurality of blocks is un-transposed; drawing, in parallel from the plurality of discrete probability distributions, a respective plurality of z values, wherein the drawing performs a first binary search on the last row of the plurality of blocks to determine a candidate block and a second binary search within the candidate block to draw each of the plurality of z values; and wherein the method is performed by one or more computing devices. 2 . The method of claim 1 , wherein the parameters are received from a memory in a transposed form. 3 . The method of claim 1 , wherein the one or more computing devices execute a plurality of threads concurrently, and wherein the drawing of each of the plurality of z values executes on a respective thread of the plurality of threads. 4 . The method of claim 1 , wherein the one or more computing devices comprises a graphics processing unit (GPU). 5 . The method of claim 1 , wherein W corresponds to a number of threads in a warp of a graphics processing unit (GPU). 6 . The method of claim 1 , wherein the receiving of the parameters is in parallel, wherein the initializing of the partial sums table is in parallel, and wherein the applying of each of the X iterations of butterfly patterned replacements is in parallel. 7 . The method of claim 1 , wherein the parameters comprise a matrix “w” corresponding to words of a plurality of documents in a document corpus, a theta matrix (A) for “w”, and a phi matrix (φ) for “w”. 8 . A non-transitory computer-readable medium storing one or more sequences of instructions which, when executed by one or more processors, cause performing of: receiving parameters for a plurality of discrete probability distributions; initializing a partial sums table based on a matrix product of the parameters, wherein the partial sums table comprises a plurality of blocks having a maximum size W that is 2 to a power of X; applying X iterations of butterfly patterned replacements to each of the plurality of blocks such that a last row of each of the plurality of blocks is un-transposed; and drawing, in parallel from the plurality of discrete probability distributions, a respective plurality of z values, wherein the drawing performs a first binary search on the last row of the plurality of blocks to determine a candidate block and a second binary search within the candidate block to draw each of the plurality of z values. 9 . The non-transitory computer-readable medium of claim 8 , wherein the parameters are received from a memory in a transposed form. 10 . The non-transitory computer-readable medium of claim 8 , wherein the one or more processors execute a plurality of threads concurrently, and wherein the drawing of each of the plurality of z values executes on a respective thread of the plurality of threads. 11 . The non-transitory computer-readable medium of claim 8 , wherein the one or more processors comprise a graphics processing unit (GPU). 12 . The non-transitory computer-readable medium of claim 8 , wherein W corresponds to a number of threads in a warp of a graphics processing unit (GPU). 13 . The non-transitory computer-readable medium of claim 8 , wherein the receiving of the parameters is in parallel, wherein the initializing of the partial sums table is in parallel, and wherein the applying of each of the X iterations of butterfly patterned replacements is in parallel. 14 . The non-transitory computer-readable medium of claim 8 , wherein the parameters comprise a matrix “w” corresponding to words of a plurality of documents in a document corpus, a theta matrix (θ) for “w”, and a phi matrix (φ) for “w”. 15 . A system comprising a graphics processing unit (GPU) configured to: receive parameters for a plurality of discrete probability distributions; initialize a partial sums table based on a matrix product of the parameters, wherein the partial sums table comprises a plurality of blocks having a maximum size W that is 2 to a power of X; apply X iterations of butterfly patterned replacements to each of the plurality of blocks such that a last row of each of the plurality of blocks is un-transposed; and draw, in parallel from the plurality of discrete probability distributions, a respective plurality of z values, wherein the draw performs a first binary search on the last row of the plurality of blocks to determine a candidate block and a second binary search within the candidate block to draw each of the plurality of z values. 16 . The system of claim 15 , wherein the GPU is further configured to receive the parameters from a memory in a transposed form. 17 . The system of claim 15 , wherein the GPU is configured to execute a plurality of threads concurrently, and wherein the GPU is configured to draw each of the plurality of z values executes on a respective thread of the plurality of threads. 18 . The system of claim 15 , wherein W corresponds to a number of threads in a warp of the GPU. 19 . The system of claim 15 , wherein the, wherein the GPU is further configured to receive the parameters in parallel, initialize the partial sums table in parallel, and apply each of the X iterations of butterfly patterned replacements in parallel. 20 . The system of claim 15 , wherein the parameters comprise a matrix “w” corresponding to words of a plurality of documents in a document corpus, a theta matrix (θ) for “w”, and a phi matrix (φ) for “w”.
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Physics · mapped topic
Inference or reasoning models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.