Bias parameters for topic modeling
US-2020279019-A1 · Sep 3, 2020 · US
US10990763B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10990763-B2 |
| Application number | US-201916407522-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 9, 2019 |
| Priority date | Mar 1, 2019 |
| Publication date | Apr 27, 2021 |
| Grant date | Apr 27, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are disclosed to improve a topic modeling system that tunes a topic model for a set of topics from a corpus of documents, by allowing users to pre-inform the tuning process with bias parameters for desired associations in the topic model. In embodiments, the topic model may be a Latent Dirichlet Allocation (LDA) model. In embodiments, the bias parameter may indicate a fixed association where a particular word in a particular document is associated with a particular topic. In embodiments, the bias parameter may specify a weight value that biases the inference process with regard to a particular association. Advantageously, the disclosed features allow users to specify a small number of parameters to steer the tuning process towards a set of desired topics. As a result, the topic model may be generated more quickly and with more useful topics.
Opening claim text (preview).
What is claimed: 1. A system, comprising: one or more processors with associated memory that implement a topic modeling system configured to: receive configuration information for tuning a topic model including a plurality of topics, wherein: the topic model is configured to assign a plurality of associations between individual topics and individual features in a corpus of documents and between individual documents and the individual topics, and the configuration information specifies at least one bias parameter indicating at least one specified association of the plurality of associations between a particular feature and a particular topic or between a particular document in the corpus and a particular topic, wherein the configuration information does not specify bias parameters for all of the plurality of associations; perform an inference process to iteratively reassign the associations in the topic model, wherein the reassignment is performed according to the bias parameter to bias the inference process for or against the specified association; and output the plurality of associations of the topic model after a convergence of the inference process. 2. The system of claim 1 , wherein the topic model comprises a Latent Dirichlet Allocation (LDA) topic model and the inference process employs a Gibbs sampling technique to select topics to reassign for individual features and individual documents. 3. The system of claim 1 , wherein the specified association is a fixed association that associates the particular feature, when appearing in a particular document, with the particular topic, and wherein the inference process does not change the fixed association. 4. The system of claim 1 , wherein the topic modeling system is configured to, during individual iterations of inference process: determine respective probabilities of the associations in the topic model, wherein a probability of the specified association is determined based at least in part on the bias parameter; and select a new topic to reassign to a current feature in a current document based at least in part on the probabilities of the associations. 5. The system of claim 4 , wherein the bias parameter is a negative weight value that reduces the probability of the specified association during the inference process. 6. The system of claim 1 , wherein individual ones of the documents are representations of individual sentences, individual ones of the topics are distinct semantic senses of verbs in the sentences, and individual ones of the features are distinct context representations of the verbs. 7. A computer-implemented method, comprising: receiving configuration information for tuning a topic model having a plurality of topics, wherein: the topic model is configured to assign a plurality of associations between individual topics and individual features in a corpus of documents and between individual documents and the individual topics, and the configuration information specifies at least one bias parameter indicating at least one specified association of the plurality of associations between a particular feature and a particular topic or between a particular document in the corpus and a particular topic, wherein the configuration information does not specify bias parameters for all of the plurality of associations; performing an inference process to iteratively reassign the associations in the topic model, wherein the reassignment is performed according to the bias parameter to bias the inference process for or against the specified association; and outputting the plurality of associations of the topic model after a convergence of the inference process. 8. The method of claim 7 , wherein the topic model comprises a Latent Dirichlet Allocation (LDA) topic model and the inference process employs a Gibbs sampling technique to select topics to reassign for individual features and individual documents. 9. The method of claim 7 , wherein the specified association indicates a fixed association, and wherein the inference process does not change the fixed association. 10. The method of claim 7 , wherein the specified association associates the particular feature with the particular topic, when the particular feature appears in a particular document. 11. The method of claim 7 , wherein performing the inference process comprises performing, during individual iterations of inference process: determining respective probabilities of the associations in the topic model, wherein the bias parameter is a weight value used to compute or modify a probability of the specified association; and selecting a new topic to reassign to a current feature in a current document based at least in part on the probabilities of the associations. 12. The method of claim 11 , wherein the weight value is a negative value that reduces the probability of the specified association during the inference process. 13. The method of claim 7 , wherein individual ones of the features comprise individual words. 14. The method of claim 7 , wherein at least some of the topics are latent topics. 15. The method of claim 7 , wherein individual ones of the documents are representations of individual sentences, individual ones of the topics are distinct semantic senses of verbs in the sentences, and individual ones of the features are distinct context representations of the verbs. 16. The method of claim 7 , wherein the configuration information specifies a plurality of bias parameters for the particular topic, wherein the plurality of bias parameters specify a bias against at least two features from being associated together to the particular topic. 17. The method of claim 7 , wherein the receiving of the configuration information, the performing of the inference process, and the outputting of the associations are performed via a topic modeling service hosted in a cloud computing environment. 18. One or more non-transitory computer-accessible storage media storing program instructions that when executed on or across one or more processors implementing a topic modeling system, cause the topic modeling system to: receive configuration information for tuning a topic model having a plurality of topics, wherein: the topic model is configured to assign a plurality of associations between individual topics and individual features in a corpus of documents and between individual documents and the individual topics, and the configuration information specifies at least one bias parameter indicating at least one specified association of the plurality of associations between a particular feature and a particular topic or between a particular document in the corpus and a particular topic, wherein the configuration information does not specify bias parameters for all of the plurality of associations; perform an inference process to iteratively reassign the associations in the topic model, wherein the reassignment is performed according to the bias parameter to bias the inference process for or against the specified association; and output the plurality of associations of the topic model after a convergence of the inference process. 19. The one or more non-transitory computer-accessible storage media of claim 18 , wherein the specified association is a fixed association that associates the particular feature, when appearing in a particular document, with the particular topic, and wherein the program instructions when executed on or across the one or more processors causes the topic modeling system to perform the inference proces
Related publications grouped by family.
Answers are generated from the same data shown on this page.