Bias parameters for topic modeling

US10990763B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10990763-B2
Application numberUS-201916407522-A
CountryUS
Kind codeB2
Filing dateMay 9, 2019
Priority dateMar 1, 2019
Publication dateApr 27, 2021
Grant dateApr 27, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed to improve a topic modeling system that tunes a topic model for a set of topics from a corpus of documents, by allowing users to pre-inform the tuning process with bias parameters for desired associations in the topic model. In embodiments, the topic model may be a Latent Dirichlet Allocation (LDA) model. In embodiments, the bias parameter may indicate a fixed association where a particular word in a particular document is associated with a particular topic. In embodiments, the bias parameter may specify a weight value that biases the inference process with regard to a particular association. Advantageously, the disclosed features allow users to specify a small number of parameters to steer the tuning process towards a set of desired topics. As a result, the topic model may be generated more quickly and with more useful topics.

First claim

Opening claim text (preview).

What is claimed: 1. A system, comprising: one or more processors with associated memory that implement a topic modeling system configured to: receive configuration information for tuning a topic model including a plurality of topics, wherein: the topic model is configured to assign a plurality of associations between individual topics and individual features in a corpus of documents and between individual documents and the individual topics, and the configuration information specifies at least one bias parameter indicating at least one specified association of the plurality of associations between a particular feature and a particular topic or between a particular document in the corpus and a particular topic, wherein the configuration information does not specify bias parameters for all of the plurality of associations; perform an inference process to iteratively reassign the associations in the topic model, wherein the reassignment is performed according to the bias parameter to bias the inference process for or against the specified association; and output the plurality of associations of the topic model after a convergence of the inference process. 2. The system of claim 1 , wherein the topic model comprises a Latent Dirichlet Allocation (LDA) topic model and the inference process employs a Gibbs sampling technique to select topics to reassign for individual features and individual documents. 3. The system of claim 1 , wherein the specified association is a fixed association that associates the particular feature, when appearing in a particular document, with the particular topic, and wherein the inference process does not change the fixed association. 4. The system of claim 1 , wherein the topic modeling system is configured to, during individual iterations of inference process: determine respective probabilities of the associations in the topic model, wherein a probability of the specified association is determined based at least in part on the bias parameter; and select a new topic to reassign to a current feature in a current document based at least in part on the probabilities of the associations. 5. The system of claim 4 , wherein the bias parameter is a negative weight value that reduces the probability of the specified association during the inference process. 6. The system of claim 1 , wherein individual ones of the documents are representations of individual sentences, individual ones of the topics are distinct semantic senses of verbs in the sentences, and individual ones of the features are distinct context representations of the verbs. 7. A computer-implemented method, comprising: receiving configuration information for tuning a topic model having a plurality of topics, wherein: the topic model is configured to assign a plurality of associations between individual topics and individual features in a corpus of documents and between individual documents and the individual topics, and the configuration information specifies at least one bias parameter indicating at least one specified association of the plurality of associations between a particular feature and a particular topic or between a particular document in the corpus and a particular topic, wherein the configuration information does not specify bias parameters for all of the plurality of associations; performing an inference process to iteratively reassign the associations in the topic model, wherein the reassignment is performed according to the bias parameter to bias the inference process for or against the specified association; and outputting the plurality of associations of the topic model after a convergence of the inference process. 8. The method of claim 7 , wherein the topic model comprises a Latent Dirichlet Allocation (LDA) topic model and the inference process employs a Gibbs sampling technique to select topics to reassign for individual features and individual documents. 9. The method of claim 7 , wherein the specified association indicates a fixed association, and wherein the inference process does not change the fixed association. 10. The method of claim 7 , wherein the specified association associates the particular feature with the particular topic, when the particular feature appears in a particular document. 11. The method of claim 7 , wherein performing the inference process comprises performing, during individual iterations of inference process: determining respective probabilities of the associations in the topic model, wherein the bias parameter is a weight value used to compute or modify a probability of the specified association; and selecting a new topic to reassign to a current feature in a current document based at least in part on the probabilities of the associations. 12. The method of claim 11 , wherein the weight value is a negative value that reduces the probability of the specified association during the inference process. 13. The method of claim 7 , wherein individual ones of the features comprise individual words. 14. The method of claim 7 , wherein at least some of the topics are latent topics. 15. The method of claim 7 , wherein individual ones of the documents are representations of individual sentences, individual ones of the topics are distinct semantic senses of verbs in the sentences, and individual ones of the features are distinct context representations of the verbs. 16. The method of claim 7 , wherein the configuration information specifies a plurality of bias parameters for the particular topic, wherein the plurality of bias parameters specify a bias against at least two features from being associated together to the particular topic. 17. The method of claim 7 , wherein the receiving of the configuration information, the performing of the inference process, and the outputting of the associations are performed via a topic modeling service hosted in a cloud computing environment. 18. One or more non-transitory computer-accessible storage media storing program instructions that when executed on or across one or more processors implementing a topic modeling system, cause the topic modeling system to: receive configuration information for tuning a topic model having a plurality of topics, wherein: the topic model is configured to assign a plurality of associations between individual topics and individual features in a corpus of documents and between individual documents and the individual topics, and the configuration information specifies at least one bias parameter indicating at least one specified association of the plurality of associations between a particular feature and a particular topic or between a particular document in the corpus and a particular topic, wherein the configuration information does not specify bias parameters for all of the plurality of associations; perform an inference process to iteratively reassign the associations in the topic model, wherein the reassignment is performed according to the bias parameter to bias the inference process for or against the specified association; and output the plurality of associations of the topic model after a convergence of the inference process. 19. The one or more non-transitory computer-accessible storage media of claim 18 , wherein the specified association is a fixed association that associates the particular feature, when appearing in a particular document, with the particular topic, and wherein the program instructions when executed on or across the one or more processors causes the topic modeling system to perform the inference proces

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Creation of semantic tools, e.g. ontology or thesauri · CPC title

  • Clustering; Classification · CPC title

  • Machine learning · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10990763B2 cover?
Systems and methods are disclosed to improve a topic modeling system that tunes a topic model for a set of topics from a corpus of documents, by allowing users to pre-inform the tuning process with bias parameters for desired associations in the topic model. In embodiments, the topic model may be a Latent Dirichlet Allocation (LDA) model. In embodiments, the bias parameter may indicate a fixed …
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 27 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).