Data relevance calculation program, device, and method

US2016196292A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016196292-A1
Application numberUS-201514971312-A
CountryUS
Kind codeA1
Filing dateDec 16, 2015
Priority dateJan 5, 2015
Publication dateJul 7, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data relevance calculation program for; extracting topics from a group of individual data items and a group of target data items, each item including an index part and a content part, and at least a part of the target data items is related to any of the individual data items, based on words included in the individual data items and the target data items; setting an attribute of each topic based on a degree at which the topic is characterized by words included in the index part or included in the content part; and calculating relevance between any of the individual data items and each of the target data items based on the strength of a relationship between a topic included in an individual data item and a topic included in a target data item related to the individual data item and on the attribute of each topic.

First claim

Opening claim text (preview).

What is claimed is: 1 . A non-transitory and computer-readable storage medium that stores a data relevance calculation program for causing a computer to execute processing comprising: extracting a plurality of topics from a group of individual data items, each of which includes an index part and a content part, and a group of target data items, each of which includes an index part and a content part, and at least a part of which is related to any of the individual data items, based on words that are included in the group of the individual data items and the group of the target data items; setting an attribute of each of the topics based on at least one of a degree at which each of the extracted topics is characterized by words that are included in the index part and a degree at which each of the extracted topics is characterized by words that are included in the content part; and calculating relevance between any of the individual data items that are included in the group of the individual data items and each of the target data items that are included in the group of the target data items based on the strength of a relationship between a topic that is included in an individual data item and a topic that is included in a target data item related to the individual data item and on the attribute of each of the topics. 2 . The storage medium that stores a data relevance calculation program according to claim 1 , wherein in a case where the attribute of the topic that is included in the individual data item differs from the attribute of the topic that is included in the target data item related to the individual data item in the calculating of the relevance, the strength of the relationship between the topics is set to be lower than the strength of the relationship between the topics in a case where the attributes of both the topics are the same. 3 . The storage medium that stores a data relevance calculation program according to claim 1 , wherein as the attribute of each of the topics, an attribute indicating that the topic is characterized by the words included in the index part is set if the number of the words that are included in the index part is larger than the number of the words that are included in the content part among the plurality of words that characterize each topic, and an attribute indicating that the topic is characterized by the words included in the content part is set if the number of the words that are included in the content part is larger than the number of the words that are included in the index part. 4 . The storage medium that stores a data relevance calculation program according to claim 1 , wherein the sum of probabilities at which the respective words that are included in the index part, from among a plurality of words that are extracted as words characterizing each topic, occur in the topic is a degree at which the topic is characterized by the words that are included in the index part, and the sum of probabilities at which the respective words that are included in the content part occur in the topic is a degree at which the topic is characterized by the words that are included in the content part. 5 . The storage medium that stores a data relevance calculation program according to claim 1 , wherein each of the individual data items and the target data items is a document data item that is described in a natural language, wherein the index part is a part in which words or word sequences in accordance with a type of content represented by the respective parts of the document data are described, and wherein the content part is a part other than the index part in the document data. 6 . A data relevance calculation device comprising: an extraction unit configured to extract a plurality of topics from a group of individual data items, each of which includes an index part and a content part, and a group of target data items, each of which includes an index part and a content part, and at least a part of which is related to any of the individual data items, based on words that are included in the group of the individual data items and the group of the target data items; a setting unit configured to set an attribute of each of the topics based on at least one of a degree at which each of the topics that are extracted by the extraction unit is characterized by words that are included in the index part and a degree at which each of the topics that are extracted by the extraction unit is characterized by words that are included in the content part; and a calculation unit configured to calculate relevance between any of the individual data items that are included in the group of the individual data items and each of the target data items that are included in the group of the target data items based on the strength of a relationship between a topic that is included in an individual data item and a topic that is included in a target data item related to the individual data item and on the attribute of each of the topics set by the setting unit. 7 . The data relevance calculation device according to claim 6 , wherein in a case where the attribute of the topic that is included in the individual data item differs from the attribute of the topic that is included in the target data item related to the individual data item, the calculation unit sets the strength of the relationship between the topics to be lower than the strength of the relationship between the topics in a case where the attributes of both the topics are the same. 8 . The data relevance calculation device according to claim 6 , wherein the setting unit sets an attribute indicating that the topic is characterized by the words included in the index part if the number of the words that are included in the index part is larger than the number of the words that are included in the content part among the plurality of words that characterize each topic, and sets an attribute indicating that the topic is characterized by the words included in the content part if the number of the words that are included in the content part is larger than the number of the words that are included in the index part. 9 . The data relevance calculation device according to claim 6 , wherein the setting unit regards a sum of probabilities at which the respective words that are included in the index part, from among a plurality of words that are extracted as words characterizing each topic, occur in the topic as a degree at which the topic is characterized by the words that are included in the index part, and regards a sum of probabilities at which the respective words that are included in the content part occur in the topic as a degree at which the topic is characterized by the words that are included in the content part. 10 . The data relevance calculation device according to claim 6 , wherein each of the individual data items and the target data items is a document data item that is described in a natural language, wherein the index part is a part in which words or word sequences in accordance with a type of content represented by the respective parts of the document data are described, and wherein the content part is a part other than the index part in the document data. 11 . A data relevance calculation method of causing a computer to execute processing comprising: extracting a plurality of topics from a group of individual data items, each of which includes an index part and a content part, and a group of target data items, each of which includes an index part and a content part, and at least a part of which is related to any of the individual data items, based on words that are included in the group of the indivi

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016196292A1 cover?
Data relevance calculation program for; extracting topics from a group of individual data items and a group of target data items, each item including an index part and a content part, and at least a part of the target data items is related to any of the individual data items, based on words included in the individual data items and the target data items; setting an attribute of each topic based…
Who is the assignee on this patent?
Fujitsu Ltd
What technology area does this patent fall under?
Primary CPC classification G06F17/30321. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 07 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).