Parallel gibbs sampler using butterfly-patterned partial sums

US2016224902A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016224902-A1
Application numberUS-201514713205-A
CountryUS
Kind codeA1
Filing dateMay 15, 2015
Priority dateJun 26, 2014
Publication dateAug 4, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An efficient parallel Gibbs sampler using butterfly-patterned partial sums is provided. Instead of building and searching a complete prefix sums table, an alternative “butterfly patterned partial sums table” is described that integrates a lightweight transposition and partial sums operation. Accordingly, the usual full matrix transposition and full prefix sums table building operations can be omitted in favor of building the butterfly-patterned partial sums table, which requires less computational and communication effort. This butterfly-patterned partial sums table is used by a modified binary search phase that calculates the needed prefix-sum table values on-the-fly using the butterfly-patterned partial sums table. Transposed memory access is also provided while avoiding the full matrix transform, providing significant performance benefits for highly parallel architectures, such as graphics processing units (GPUs) where 1-stride or sequential memory accesses are important for optimization.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for parallel Gibbs sampling from a plurality of discrete probability distributions, wherein the method comprises: receiving parameters for the plurality of discrete probability distributions; initializing a partial sums table based on a matrix product of the parameters, wherein the partial sums table comprises a plurality of blocks having a maximum size W that is 2 to a power of X; applying X iterations of butterfly patterned replacements to each of the plurality of blocks such that a last row of each of the plurality of blocks is un-transposed; drawing, in parallel from the plurality of discrete probability distributions, a respective plurality of z values, wherein the drawing performs a first binary search on the last row of the plurality of blocks to determine a candidate block and a second binary search within the candidate block to draw each of the plurality of z values; and wherein the method is performed by one or more computing devices. 2 . The method of claim 1 , wherein the parameters are received from a memory in a transposed form. 3 . The method of claim 1 , wherein the one or more computing devices execute a plurality of threads concurrently, and wherein the drawing of each of the plurality of z values executes on a respective thread of the plurality of threads. 4 . The method of claim 1 , wherein the one or more computing devices comprises a graphics processing unit (GPU). 5 . The method of claim 1 , wherein W corresponds to a number of threads in a warp of a graphics processing unit (GPU). 6 . The method of claim 1 , wherein the receiving of the parameters is in parallel, wherein the initializing of the partial sums table is in parallel, and wherein the applying of each of the X iterations of butterfly patterned replacements is in parallel. 7 . The method of claim 1 , wherein the parameters comprise a matrix “w” corresponding to words of a plurality of documents in a document corpus, a theta matrix (A) for “w”, and a phi matrix (φ) for “w”. 8 . A non-transitory computer-readable medium storing one or more sequences of instructions which, when executed by one or more processors, cause performing of: receiving parameters for a plurality of discrete probability distributions; initializing a partial sums table based on a matrix product of the parameters, wherein the partial sums table comprises a plurality of blocks having a maximum size W that is 2 to a power of X; applying X iterations of butterfly patterned replacements to each of the plurality of blocks such that a last row of each of the plurality of blocks is un-transposed; and drawing, in parallel from the plurality of discrete probability distributions, a respective plurality of z values, wherein the drawing performs a first binary search on the last row of the plurality of blocks to determine a candidate block and a second binary search within the candidate block to draw each of the plurality of z values. 9 . The non-transitory computer-readable medium of claim 8 , wherein the parameters are received from a memory in a transposed form. 10 . The non-transitory computer-readable medium of claim 8 , wherein the one or more processors execute a plurality of threads concurrently, and wherein the drawing of each of the plurality of z values executes on a respective thread of the plurality of threads. 11 . The non-transitory computer-readable medium of claim 8 , wherein the one or more processors comprise a graphics processing unit (GPU). 12 . The non-transitory computer-readable medium of claim 8 , wherein W corresponds to a number of threads in a warp of a graphics processing unit (GPU). 13 . The non-transitory computer-readable medium of claim 8 , wherein the receiving of the parameters is in parallel, wherein the initializing of the partial sums table is in parallel, and wherein the applying of each of the X iterations of butterfly patterned replacements is in parallel. 14 . The non-transitory computer-readable medium of claim 8 , wherein the parameters comprise a matrix “w” corresponding to words of a plurality of documents in a document corpus, a theta matrix (θ) for “w”, and a phi matrix (φ) for “w”. 15 . A system comprising a graphics processing unit (GPU) configured to: receive parameters for a plurality of discrete probability distributions; initialize a partial sums table based on a matrix product of the parameters, wherein the partial sums table comprises a plurality of blocks having a maximum size W that is 2 to a power of X; apply X iterations of butterfly patterned replacements to each of the plurality of blocks such that a last row of each of the plurality of blocks is un-transposed; and draw, in parallel from the plurality of discrete probability distributions, a respective plurality of z values, wherein the draw performs a first binary search on the last row of the plurality of blocks to determine a candidate block and a second binary search within the candidate block to draw each of the plurality of z values. 16 . The system of claim 15 , wherein the GPU is further configured to receive the parameters from a memory in a transposed form. 17 . The system of claim 15 , wherein the GPU is configured to execute a plurality of threads concurrently, and wherein the GPU is configured to draw each of the plurality of z values executes on a respective thread of the plurality of threads. 18 . The system of claim 15 , wherein W corresponds to a number of threads in a warp of the GPU. 19 . The system of claim 15 , wherein the, wherein the GPU is further configured to receive the parameters in parallel, initialize the partial sums table in parallel, and apply each of the X iterations of butterfly patterned replacements in parallel. 20 . The system of claim 15 , wherein the parameters comprise a matrix “w” corresponding to words of a plurality of documents in a document corpus, a theta matrix (θ) for “w”, and a phi matrix (φ) for “w”.

Assignees

Inventors

Classifications

  • G06N7/01Primary

    Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06N99/005Primary

    Physics · mapped topic

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016224902A1 cover?
An efficient parallel Gibbs sampler using butterfly-patterned partial sums is provided. Instead of building and searching a complete prefix sums table, an alternative “butterfly patterned partial sums table” is described that integrates a lightweight transposition and partial sums operation. Accordingly, the usual full matrix transposition and full prefix sums table building operations can be o…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06N7/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 04 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).