Word latent topic estimation device and word latent topic estimation method

US9519633B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9519633-B2
Application numberUS-201314417855-A
CountryUS
Kind codeB2
Filing dateJul 9, 2013
Priority dateJul 31, 2012
Publication dateDec 13, 2016
Grant dateDec 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a word latent topic estimation device and a word latent topic estimation method which are capable of hierarchically performing processing and which are capable of rapidly estimating latent topics of a word while taking into consideration a mixed state of topics. The present invention is provided with: a document data addition unit ( 11 ) which inputs a document which includes one or more words; a level setting unit ( 12 ) which sets a number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating latent topics of a word; a higher-level constraint creation unit ( 15 ) which, on the basis of results of topic estimation at a given level with regard to a word within the document, creates a higher-level constraint indicating an identifier of a topic for which there is a possibility of being assigned to the word and a probability of being assigned to the topic; and a higher-level-constraint-attached topic estimation unit ( 13 ) which, when estimating the probability of each word being assigned to each topic, refers to the higher-order constraint, uses the probability of being assigned to a parent topic at the higher level as a weight, and performs estimation processing to a lower-level topic.

First claim

Opening claim text (preview).

What is claimed is: 1. word latent topic estimation device for estimating a probabilistic word latent topic sequentially each time a document is added, comprising: a document data addition unit that inputs a document including one or more words; a level setting unit that sets the number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating a latent topic of a word, the hierarchical structure constituted with a preset width and a preset depth; a higher-level constraint creation unit that creates a higher-level constraint for a word in the document on the basis of a result of topic estimation at a given level, the higher-level constraint indicating an identifier of a topic that is likely to be assigned to the word and the probability of the word being assigned to the topic; and a higher-level-constraint-attached topic estimation unit which, when estimating the probability of each word in the input document being assigned to each topic, refers to the higher-level constraint which is created by the higher-level constraint creation unit and uses the probability of the word being assigned to a parent topic at a higher level as a weight to perform estimation processing for a lower-level topic. 2. The word latent topic estimation device according to claim 1 , wherein the probability of each word in the document being assigned to each topic at a given level is estimated and when the probability of a word being assigned to a topic is smaller than a preset threshold value, the higher-level constraint creation unit corrects the probability to 0 and creates a higher-level constraint including an identifier of a topic having a probability greater than 0 and the corrected probability. 3. The word latent topic estimation device according to claim 1 , further comprising an initial value update unit which, after the higher-level-constraint-attached topic estimation unit performs hierarchical topic estimation, computes the amount of computation when the current initial value of the number of topics is used without change and the amount of computation when the initial value of the number of topics is reduced and, when the difference between the amounts of computation is greater than a predetermined threshold value, reduces the initial value of the number of topics. 4. The word latent topic estimation device according to claim 1 , further comprising an initial value update unit which counts the number of added documents and, when the counted number of documents is smaller than a predetermined threshold value, sets an initial value of the number of topics so that topic estimation for all topics is performed rather than the hierarchical topic estimation. 5. A word latent topic estimation method for estimating a probabilistic word latent topic sequentially each time a document is added, comprising: inputting, by a word latent topic estimation computing device, a document including one or more words; setting, by the word latent topic estimation computing device, the number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating a latent topic of a word, the hierarchical structure constituted with a preset width and a preset depth; creating, by the word latent topic estimation computing device, a higher-level constraint for a word in the document on the basis of a result of topic estimation at a given level, the higher-level constraint indicating an identifier of a topic that is likely to be assigned to the word and the probability of the word being assigned to the topic; and when estimating the probability of each word in the input document being assigned to each topic, referring, by the word latent topic estimation computing device, to the higher-level constraint created by the creating step and using the probability of the word being assigned to a parent topic at a higher level as a weight to perform estimation processing for a lower-level topic. 6. The word latent topic estimation method according to claim 5 , wherein the probability of each word in the document being assigned to each topic at a given level is estimated and when the probability of a word being assigned to a topic is smaller than a preset threshold value, the probability is corrected to 0 and a higher-level constraint including an identifier of a topic having a probability greater than 0 and the corrected probability is created. 7. The word latent topic estimation method according to claim 5 , wherein after hierarchical topic estimation is performed, the amount of computation when the current initial value of the number of topics is used without change and the amount of computation when the initial value of the number of topics is reduced are computed and, when the difference between the amounts of computation is greater than a predetermined threshold value, the initial value of the number of topics is reduced. 8. The word latent topic estimation method according to claim 5 , wherein the number of added documents is counted and, when the counted number of documents is smaller than a predetermined threshold value, an initial value of the number of topics is set so that topic estimation for all topics is performed rather than the hierarchical topic estimation. 9. A non-transitory computer readable medium program that causes a computer serving as an information processing device storing a program for causing a computer to execute a process: inputting a document including one or more words; setting the number of topics at each level in accordance with a hierarchical structure of topics for hierarchically estimating a latent topic of a word, the hierarchical structure constituted with a preset width and a preset depth; creating a higher-level constraint for a word in the document on the basis of a result of topic estimation at a given level, the higher-level constraint indicating an identifier of a topic that is likely to be assigned to the word and the probability of the word being assigned to the topic; and when estimating the probability of each word in the input document being assigned to each topic, referring to the higher-level constraint created by the creating process step and using the probability of the word being assigned to a parent topic at a higher level as a weight to perform estimation processing for a lower-level topic.

Assignees

Inventors

Classifications

  • G06F40/20Primary

    Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Heading extraction; Automatic titling; Numbering · CPC title

  • Physics · mapped topic

  • G06F17/27Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9519633B2 cover?
Provided are a word latent topic estimation device and a word latent topic estimation method which are capable of hierarchically performing processing and which are capable of rapidly estimating latent topics of a word while taking into consideration a mixed state of topics. The present invention is provided with: a document data addition unit ( 11 ) which inputs a document which includes one o…
Who is the assignee on this patent?
Nec Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).