What technology area does this patent fall under?

Primary CPC classification G06F40/205. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for distributed latent dirichlet allocation computation using addition of approximate counters

US10140281B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10140281-B2
Application number	US-201514821511-A
Country	US
Kind code	B2
Filing date	Aug 7, 2015
Priority date	Aug 7, 2015
Publication date	Nov 27, 2018
Grant date	Nov 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Herein is described a data-parallel algorithm for topic modeling on a distributed system in which memory and communication bandwidth requirements are streamlined for distributed implementation. According to embodiments, a distributed LDA Gibbs sampling algorithm shares approximate counter values amongst the nodes of a distributed system. These approximate counter values are repeatedly aggregated and then shared again to perform the distributed LDA Gibbs sampling. In order to maintain the shared counter values as approximate counter values of sixteen bits or less, approximate counter values are summed to produce aggregate approximate counter values. These small aggregate approximate counter values are shared between the nodes of the distributed system. As such, the addition of various types of approximate counters is described herein. Specifically, addition of binary Morris approximate counters, general Morris approximate counters, and Csűrös approximate counters are described in the context of distributed implementations of an LDA Gibbs sampling algorithm.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for a distributed system, including a first computing device and a second computing device communicatively connected via a network, running a distributed uncollapsed Gibbs sampler comprising: the first computing device running the uncollapsed Gibbs sampler over a Dirichlet distribution of a plurality of words in a set of documents to produce sampler result data, further comprising: receiving, from the second computing device, a first approximate counter value that corresponds to a particular counter of the distributed uncollapsed Gibbs sampler, using one or more probabilistic techniques to increment a second approximate counter value that also corresponds to the particular counter; adding the first approximate counter value to the second approximate counter value to produce an aggregate approximate counter value, and converting the aggregate approximate counter value to an expected value represented by the aggregate approximate counter value; and using the expected value generated from the aggregate approximate counter value as the value of the particular counter; and determining, from the sampler result data, one or more sets of correlated words. 2. The method of claim 1 , wherein: the first and second approximate counter values are binary Morris approximate counter values; and adding the first approximate counter value to the second approximate counter value comprises: identifying which of the first and second approximate counter values is the largest, among the first and second approximate counter values, and which of the first and second approximate counter values is the smallest, among the first and second approximate counter values, determining a difference value that represents a difference between the largest of the first and second approximate counter values and the smallest of the first and second approximate counter values, generating a first set of uniform random bits, wherein the number of bits, in the first set of uniform random bits, is the difference value, generating a second set of uniform random bits, wherein the number of bits, in the second set of uniform random bits, is the smallest of the values, responsive to determining that both of (a) all of the first set of uniform random bits have zero values, and (b) at least one of the bits in the second set of uniform random bits has a value of one: setting the aggregate approximate counter value to be one more than the largest of the values, and responsive to determining that either (a) at least one of the bits of the first set of uniform random bits has a value of one, or (b) all of the second set of uniform random bits have zero values: setting the aggregate approximate counter value to be the largest of the values. 3. The method of claim 1 , wherein: the first and second approximate counter values are general Morris approximate counter values; and adding the first approximate counter value to the second approximate counter value comprises: calculating a sum value that represents a sum of expected values of the first and second approximate counter values, identifying a particular general Morris approximate counter value, wherein: an expected value of the particular general Morris approximate counter value is not above the sum value, and an expected value of one more than the particular general Morris approximate counter value is above the sum value; generating a random real value chosen uniformly pseudorandomly; determining whether the random real value is less than a difference value that is based, at least in part, on a difference between the sum value and the expected value of the particular general Morris approximate counter value; responsive to determining that the random real value is less than the difference value: setting the aggregate approximate counter value to be one more than the particular general Morris approximate counter value, and responsive to determining that the random real value is not less than the difference value: setting the aggregate approximate counter value to be the particular general Morris approximate counter value. 4. The method of claim 1 , wherein: the first and second approximate counter values are Csűrös approximate counter values; and adding the first approximate counter value to the second approximate counter value comprises: calculating a sum value that represents a sum of expected values of the first and second approximate counter values, identifying a particular Csűrös approximate counter value based, at least in part, on a first subset of bits representing the sum value, wherein: an expected value of the particular Csűrös approximate counter value is not above the sum value, and an expected value of one more than the particular Csűrös approximate counter value is above the sum value; generating a random integer value chosen uniformly pseudorandomly; determining whether the random integer value is less than a value represented by a second subset of the bits representing the sum value; wherein the bits in the first subset are located at distinct positions, within the representation of the sum value, from positions, within the representation of the sum value, of the bits in the second subset; responsive to determining that the random integer value is less than the value represented by the second subset of the bits representing the sum value: setting the aggregate approximate counter value to be one more than the particular Csűrös approximate counter value, and responsive to determining that the random integer value is not less than the value represented by the second subset of the bits representing the sum value: setting the aggregate approximate counter value to be the particular Csűrös approximate counter value. 5. The method of claim 1 , wherein the first approximate counter value is represented by at most 8 bits. 6. The method of claim 1 , further comprising, previous to adding the first approximate counter value to the second approximate counter value, the first computing device producing the second approximate counter value based on running the uncollapsed Gibbs sampler. 7. The method of claim 1 , wherein the uncollapsed Gibbs sampler is a Greedy Gibbs sampler. 8. The method of claim 1 , wherein the first computing device running the uncollapsed Gibbs sampler comprises performing one or more computations on a Graphics Processing Unit (GPU). 9. The method of claim 1 , further comprising: receiving, from the second computing device, a third counter value; adding the third counter value to a fourth counter value to produce a second aggregate counter value; and wherein the first and second approximate counter values are of a first type of approximate counter, and the third and fourth counter values are of a second type of approximate counter; wherein the first and second types of approximate counters are distinct types of approximate counters; wherein the first type of approximate counter is one of: binary Morris approximate counters, general Morris approximate counters, or Csűrös approximate counters; and wherein the second type of approximate counter is one of: binary Morris approximate counters, general Morris approximate counters, Csűrös approximate counters, or conventional counters. 10. One or more non-transitory computer-readable media storing one or more sequences of instructions which, when executed by one or more processors, cause performance of a distributed system, including a first computing device and a second computing device communicatively connected via a network, running a distributed uncollapsed Gibbs sampler comprising: the first computing device running the uncollapsed Gibbs sampler over a Diric

Assignees

Oracle Int Corp

Inventors

Classifications

G06F40/205Primary
Parsing · CPC title
G06F16/35Primary
Clustering; Classification · CPC title
G06F17/30705
Physics · mapped topic
G06F17/2705Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 58052536

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10140281B2 cover?: Herein is described a data-parallel algorithm for topic modeling on a distributed system in which memory and communication bandwidth requirements are streamlined for distributed implementation. According to embodiments, a distributed LDA Gibbs sampling algorithm shares approximate counter values amongst the nodes of a distributed system. These approximate counter values are repeatedly aggregate…
Who is the assignee on this patent?: Oracle Int Corp
What technology area does this patent fall under?: Primary CPC classification G06F40/205. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).