Machine learning collaboration techniques
US-2024420212-A1 · Dec 19, 2024 · US
US9519633B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9519633-B2 |
| Application number | US-201314417855-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 9, 2013 |
| Priority date | Jul 31, 2012 |
| Publication date | Dec 13, 2016 |
| Grant date | Dec 13, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided are a word latent topic estimation device and a word latent topic estimation method which are capable of hierarchically performing processing and which are capable of rapidly estimating latent topics of a word while taking into consideration a mixed state of topics. The present invention is provided with: a document data addition unit ( 11 ) which inputs a document which includes one or more words; a level setting unit ( 12 ) which sets a number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating latent topics of a word; a higher-level constraint creation unit ( 15 ) which, on the basis of results of topic estimation at a given level with regard to a word within the document, creates a higher-level constraint indicating an identifier of a topic for which there is a possibility of being assigned to the word and a probability of being assigned to the topic; and a higher-level-constraint-attached topic estimation unit ( 13 ) which, when estimating the probability of each word being assigned to each topic, refers to the higher-order constraint, uses the probability of being assigned to a parent topic at the higher level as a weight, and performs estimation processing to a lower-level topic.
Opening claim text (preview).
What is claimed is: 1. word latent topic estimation device for estimating a probabilistic word latent topic sequentially each time a document is added, comprising: a document data addition unit that inputs a document including one or more words; a level setting unit that sets the number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating a latent topic of a word, the hierarchical structure constituted with a preset width and a preset depth; a higher-level constraint creation unit that creates a higher-level constraint for a word in the document on the basis of a result of topic estimation at a given level, the higher-level constraint indicating an identifier of a topic that is likely to be assigned to the word and the probability of the word being assigned to the topic; and a higher-level-constraint-attached topic estimation unit which, when estimating the probability of each word in the input document being assigned to each topic, refers to the higher-level constraint which is created by the higher-level constraint creation unit and uses the probability of the word being assigned to a parent topic at a higher level as a weight to perform estimation processing for a lower-level topic. 2. The word latent topic estimation device according to claim 1 , wherein the probability of each word in the document being assigned to each topic at a given level is estimated and when the probability of a word being assigned to a topic is smaller than a preset threshold value, the higher-level constraint creation unit corrects the probability to 0 and creates a higher-level constraint including an identifier of a topic having a probability greater than 0 and the corrected probability. 3. The word latent topic estimation device according to claim 1 , further comprising an initial value update unit which, after the higher-level-constraint-attached topic estimation unit performs hierarchical topic estimation, computes the amount of computation when the current initial value of the number of topics is used without change and the amount of computation when the initial value of the number of topics is reduced and, when the difference between the amounts of computation is greater than a predetermined threshold value, reduces the initial value of the number of topics. 4. The word latent topic estimation device according to claim 1 , further comprising an initial value update unit which counts the number of added documents and, when the counted number of documents is smaller than a predetermined threshold value, sets an initial value of the number of topics so that topic estimation for all topics is performed rather than the hierarchical topic estimation. 5. A word latent topic estimation method for estimating a probabilistic word latent topic sequentially each time a document is added, comprising: inputting, by a word latent topic estimation computing device, a document including one or more words; setting, by the word latent topic estimation computing device, the number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating a latent topic of a word, the hierarchical structure constituted with a preset width and a preset depth; creating, by the word latent topic estimation computing device, a higher-level constraint for a word in the document on the basis of a result of topic estimation at a given level, the higher-level constraint indicating an identifier of a topic that is likely to be assigned to the word and the probability of the word being assigned to the topic; and when estimating the probability of each word in the input document being assigned to each topic, referring, by the word latent topic estimation computing device, to the higher-level constraint created by the creating step and using the probability of the word being assigned to a parent topic at a higher level as a weight to perform estimation processing for a lower-level topic. 6. The word latent topic estimation method according to claim 5 , wherein the probability of each word in the document being assigned to each topic at a given level is estimated and when the probability of a word being assigned to a topic is smaller than a preset threshold value, the probability is corrected to 0 and a higher-level constraint including an identifier of a topic having a probability greater than 0 and the corrected probability is created. 7. The word latent topic estimation method according to claim 5 , wherein after hierarchical topic estimation is performed, the amount of computation when the current initial value of the number of topics is used without change and the amount of computation when the initial value of the number of topics is reduced are computed and, when the difference between the amounts of computation is greater than a predetermined threshold value, the initial value of the number of topics is reduced. 8. The word latent topic estimation method according to claim 5 , wherein the number of added documents is counted and, when the counted number of documents is smaller than a predetermined threshold value, an initial value of the number of topics is set so that topic estimation for all topics is performed rather than the hierarchical topic estimation. 9. A non-transitory computer readable medium program that causes a computer serving as an information processing device storing a program for causing a computer to execute a process: inputting a document including one or more words; setting the number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating a latent topic of a word, the hierarchical structure constituted with a preset width and a preset depth; creating a higher-level constraint for a word in the document on the basis of a result of topic estimation at a given level, the higher-level constraint indicating an identifier of a topic that is likely to be assigned to the word and the probability of the word being assigned to the topic; and when estimating the probability of each word in the input document being assigned to each topic, referring to the higher-level constraint created by the creating process step and using the probability of the word being assigned to a parent topic at a higher level as a weight to perform estimation processing for a lower-level topic.
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Heading extraction; Automatic titling; Numbering · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.