Generating chains of entity mentions

US11080615B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11080615-B2
Application numberUS-201715623525-A
CountryUS
Kind codeB2
Filing dateJun 15, 2017
Priority dateJun 15, 2017
Publication dateAug 3, 2021
Grant dateAug 3, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the present invention disclose a method for analyzing data from a plurality of data sources. The method includes extracting features of data received from a first source and from a second source by analyzing the data received from the first source of data and from the second source. The method includes processors determining a topic modeling framework, wherein the topic modeling framework detects a semantic structure of the features of the data received from the first data source and the second source. The method includes processors applying the topic modeling framework to the data received from the first source of data the second source of data. The method includes generating a final entity output, wherein the final entity output includes a cluster of entity mentions that the applied topic modeling framework extracts from the first source of data and the second source of data are combined.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for analyzing data from a plurality of data sources, the method comprising: extracting, by one or more processors, features of data received from a first data source and features of the data received from a second data source by analyzing the data received from the first data source and the data received from the second data source; determining, by one or more processors, a topic modeling framework, wherein the topic modeling framework detects a semantic structure of the features of the data received from the first data source and the data received from the second data source; applying, by one or more processors, the topic modeling framework to the data received from the first data source and to the data received from the second data source; constructing, by one or more processors, a plurality of identical entity chains from the first data source and the second data source, wherein constructing the plurality of identical entity chains includes analyzing a number of mentions of a relevant topic from data provided by the first data source and from the data provided by the second data source, and wherein analyzing the number of mentions of the relevant topic includes clustering identical entity mentions by finding the number of mentions referring to the plurality of identical entity chains; and generating, by one or more processors, a final entity output, wherein the final entity output includes a cluster of identical entity mentions from the plurality of identical entity chains. 2. The method of claim 1 : wherein the data received from the first data source includes unstructured data, and wherein the data received from the data source includes structured data that is formatted and contained in a relational database. 3. The method of claim 1 , wherein determining the topic modeling framework further comprises: querying, by one of more processors, a knowledge resource to identify concepts that are associated with the features extracted from the data received from the first data source and the features extracted from the data received from the second data source; and determining, by one or more processors, the topic modeling framework based on the features extracted from the data received from the first data source and the features extracted from the data received from the second data source. 4. The method of claim 1 wherein generating a final entity output, further comprises: integrating, by one or more processors, a data ranking model, wherein the data ranking is based on a measure of the similarity of the data to the identified topic model, to identify data that refers to the same entity; generating, by one or more processors, an identical entity from the data received from the first data source and of the data received from the second data source, wherein an identical entity is constructed from a mention that refers to similar entities; and generating, by one more processors, a chain of individual entities from the data received from the first data source and of the data received from the second data source by extracting the generated identical entities. 5. The method of claim 1 wherein extracting features of data received from data first source and features of the data received from the second data source, further comprises: determining, by one or more processors, that text included in the data received from the first data source does not include concepts that relate to a knowledge resource based on an analysis of the text. 6. The method of claim 1 wherein applying the topic modeling framework further comprises: activating, by one or more processors, a tokenization process, wherein a tokenization process subdivides a plurality of text during application of the topic modeling framework. 7. The method of claim 1 , further comprising: analyzing, by one or more processors, unstructured data and structured data utilizing semi-supervised learning and unsupervised learning. 8. A computer program product for analyzing data from a plurality of data sources, the computer program product comprising: one or more computer readable tangible storage media and program instructions stored on at least one of the one or more computer readable storage media, the program instructions readable/executable by one or more computer processors and further comprising: program instructions to extract features of data received from a first data source and features of the data received from a second data source by analyzing the data received from the first data source and the data received from the second data source; program instructions to determine a topic modeling framework, wherein the topic modeling framework detects a semantic structure of the features of the data received from the first data source and the data received from the second data source; program instructions to apply the topic modeling framework to the data received from the first data source and to the data received from the second data source; program instructions to construct a plurality of identical entity chains from the first data source and the second data source, wherein constructing the plurality of identical entity chains includes analyzing a number of mentions of a relevant topic from data provided by the first data source and from the data provided by the second data source, and wherein analyzing the number of mentions of the relevant topic includes clustering identical entity mentions by finding the number of mentions referring to the plurality of identical entity chains; and program instructions to generate a final entity output, wherein the final entity output includes a cluster of identical entity mentions from the plurality of identical entity chains. 9. The computer program product of claim 8 : wherein the data received from the first data source includes unstructured data, and wherein the data received from the second data source includes structured data that is formatted and contained in a relational database. 10. The computer program product of claim 8 , wherein the program instructions to determine the topic modeling framework further comprise program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: query a knowledge resource to identify concepts that are associated with the features extracted from the data received from the first data source and the features extracted from the data received from the second data source; and determine the topic modeling framework based on the features extracted from the data received from the first data source and the features extracted from the data received from the second data source. 11. The computer program product of claim 8 , wherein the program instructions to generate a final entity output, further comprise program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: integrate a data ranking model, wherein the data ranking is based on a measure of the similarity of the data to the identified topic model, to identify data that refers to the same entity; generate an identical entity from the data received from the first data source and of the data received from the second data source, wherein an identical entity is constructed from a mention that refers to similar entities; and generate a chain of individual entities from the data received from the first data source and of the data received from the second data source by extracting the generated identical entities. 12. The computer program product of claim 8 wherein the program instructions to extr

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Inference or reasoning models · CPC title

  • into predefined classes · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11080615B2 cover?
Aspects of the present invention disclose a method for analyzing data from a plurality of data sources. The method includes extracting features of data received from a first source and from a second source by analyzing the data received from the first source of data and from the second source. The method includes processors determining a topic modeling framework, wherein the topic modeling fram…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 03 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).