Sparse and data-parallel inference method and system for the latent Dirichlet allocation model
US-9767416-B2 · Sep 19, 2017 · US
US10140281B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10140281-B2 |
| Application number | US-201514821511-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 7, 2015 |
| Priority date | Aug 7, 2015 |
| Publication date | Nov 27, 2018 |
| Grant date | Nov 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Herein is described a data-parallel algorithm for topic modeling on a distributed system in which memory and communication bandwidth requirements are streamlined for distributed implementation. According to embodiments, a distributed LDA Gibbs sampling algorithm shares approximate counter values amongst the nodes of a distributed system. These approximate counter values are repeatedly aggregated and then shared again to perform the distributed LDA Gibbs sampling. In order to maintain the shared counter values as approximate counter values of sixteen bits or less, approximate counter values are summed to produce aggregate approximate counter values. These small aggregate approximate counter values are shared between the nodes of the distributed system. As such, the addition of various types of approximate counters is described herein. Specifically, addition of binary Morris approximate counters, general Morris approximate counters, and Csűrös approximate counters are described in the context of distributed implementations of an LDA Gibbs sampling algorithm.
Opening claim text (preview).
What is claimed is: 1. A method for a distributed system, including a first computing device and a second computing device communicatively connected via a network, running a distributed uncollapsed Gibbs sampler comprising: the first computing device running the uncollapsed Gibbs sampler over a Dirichlet distribution of a plurality of words in a set of documents to produce sampler result data, further comprising: receiving, from the second computing device, a first approximate counter value that corresponds to a particular counter of the distributed uncollapsed Gibbs sampler, using one or more probabilistic techniques to increment a second approximate counter value that also corresponds to the particular counter; adding the first approximate counter value to the second approximate counter value to produce an aggregate approximate counter value, and converting the aggregate approximate counter value to an expected value represented by the aggregate approximate counter value; and using the expected value generated from the aggregate approximate counter value as the value of the particular counter; and determining, from the sampler result data, one or more sets of correlated words. 2. The method of claim 1 , wherein: the first and second approximate counter values are binary Morris approximate counter values; and adding the first approximate counter value to the second approximate counter value comprises: identifying which of the first and second approximate counter values is the largest, among the first and second approximate counter values, and which of the first and second approximate counter values is the smallest, among the first and second approximate counter values, determining a difference value that represents a difference between the largest of the first and second approximate counter values and the smallest of the first and second approximate counter values, generating a first set of uniform random bits, wherein the number of bits, in the first set of uniform random bits, is the difference value, generating a second set of uniform random bits, wherein the number of bits, in the second set of uniform random bits, is the smallest of the values, responsive to determining that both of (a) all of the first set of uniform random bits have zero values, and (b) at least one of the bits in the second set of uniform random bits has a value of one: setting the aggregate approximate counter value to be one more than the largest of the values, and responsive to determining that either (a) at least one of the bits of the first set of uniform random bits has a value of one, or (b) all of the second set of uniform random bits have zero values: setting the aggregate approximate counter value to be the largest of the values. 3. The method of claim 1 , wherein: the first and second approximate counter values are general Morris approximate counter values; and adding the first approximate counter value to the second approximate counter value comprises: calculating a sum value that represents a sum of expected values of the first and second approximate counter values, identifying a particular general Morris approximate counter value, wherein: an expected value of the particular general Morris approximate counter value is not above the sum value, and an expected value of one more than the particular general Morris approximate counter value is above the sum value; generating a random real value chosen uniformly pseudorandomly; determining whether the random real value is less than a difference value that is based, at least in part, on a difference between the sum value and the expected value of the particular general Morris approximate counter value; responsive to determining that the random real value is less than the difference value: setting the aggregate approximate counter value to be one more than the particular general Morris approximate counter value, and responsive to determining that the random real value is not less than the difference value: setting the aggregate approximate counter value to be the particular general Morris approximate counter value. 4. The method of claim 1 , wherein: the first and second approximate counter values are Csűrös approximate counter values; and adding the first approximate counter value to the second approximate counter value comprises: calculating a sum value that represents a sum of expected values of the first and second approximate counter values, identifying a particular Csűrös approximate counter value based, at least in part, on a first subset of bits representing the sum value, wherein: an expected value of the particular Csűrös approximate counter value is not above the sum value, and an expected value of one more than the particular Csűrös approximate counter value is above the sum value; generating a random integer value chosen uniformly pseudorandomly; determining whether the random integer value is less than a value represented by a second subset of the bits representing the sum value; wherein the bits in the first subset are located at distinct positions, within the representation of the sum value, from positions, within the representation of the sum value, of the bits in the second subset; responsive to determining that the random integer value is less than the value represented by the second subset of the bits representing the sum value: setting the aggregate approximate counter value to be one more than the particular Csűrös approximate counter value, and responsive to determining that the random integer value is not less than the value represented by the second subset of the bits representing the sum value: setting the aggregate approximate counter value to be the particular Csűrös approximate counter value. 5. The method of claim 1 , wherein the first approximate counter value is represented by at most 8 bits. 6. The method of claim 1 , further comprising, previous to adding the first approximate counter value to the second approximate counter value, the first computing device producing the second approximate counter value based on running the uncollapsed Gibbs sampler. 7. The method of claim 1 , wherein the uncollapsed Gibbs sampler is a Greedy Gibbs sampler. 8. The method of claim 1 , wherein the first computing device running the uncollapsed Gibbs sampler comprises performing one or more computations on a Graphics Processing Unit (GPU). 9. The method of claim 1 , further comprising: receiving, from the second computing device, a third counter value; adding the third counter value to a fourth counter value to produce a second aggregate counter value; and wherein the first and second approximate counter values are of a first type of approximate counter, and the third and fourth counter values are of a second type of approximate counter; wherein the first and second types of approximate counters are distinct types of approximate counters; wherein the first type of approximate counter is one of: binary Morris approximate counters, general Morris approximate counters, or Csűrös approximate counters; and wherein the second type of approximate counter is one of: binary Morris approximate counters, general Morris approximate counters, Csűrös approximate counters, or conventional counters. 10. One or more non-transitory computer-readable media storing one or more sequences of instructions which, when executed by one or more processors, cause performance of a distributed system, including a first computing device and a second computing device communicatively connected via a network, running a distributed uncollapsed Gibbs sampler comprising: the first computing device running the uncollapsed Gibbs sampler over a Diric
Parsing · CPC title
Clustering; Classification · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.