Contextualization of entity relationships

US9740749B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9740749-B2
Application numberUS-201414462993-A
CountryUS
Kind codeB2
Filing dateAug 19, 2014
Priority dateAug 19, 2014
Publication dateAug 22, 2017
Grant dateAug 22, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and arrangements for identifying related data in different data sets to assist in searching the data sets. A first data asset and a second data asset are accessed. Common entities are identified between the first and second data assets. A score is determined for the relationship between the first and second data assets, based on the identified common entities. One or more relationship contexts are determined for the relationship between the first and second data assets, and the relationship score and one or more relationship contexts are used to join at least a portion of each of the first and second data assets as a basis for subsequent searching. Other variants and embodiments are broadly contemplated herein.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of identifying related data in different data sets to assist in searching the data sets, said method comprising: utilizing at least one processor to execute computer code configured to perform the steps of: accessing a first structured data asset and a second unstructured data asset; identifying entities common to the first and second data assets, wherein the identifying comprises extracting structured entities from the first structured data asset, tokenizing the second unstructured data asset, and performing a search using the tokenized second unstructured data asset with respect to the extracted structured entities; determining a non-tangible score for a relationship between the first and second data assets, wherein the non-tangible score is based on the identified common entities and wherein the non-tangible score identifies a similarity between the first structured data asset and the second unstructured data asset; determining one or more relationship contexts for the relationship between the first and second data assets; and using the relationship score and one or more relationship contexts to join at least a portion of each of the first and second data assets as a basis for subsequent searching. 2. The method according to claim 1 , wherein said identifying entities comprises searching among columnar values in the first and second data assets. 3. The method according to claim 1 , wherein said determining a score comprises determining a bi-directional relationship score. 4. The method according to claim 1 , wherein said determining a score comprises determining a uni-directional relationship score. 5. The method according to claim 1 , wherein said determining one or more relationship contexts comprises determining a uni-directional relationship context score. 6. The method according to claim 1 , comprising determining a score for each of the one or more relationship contexts. 7. The method according to claim 6 , wherein said using the relationship score and one or more relationship contexts comprises: applying a threshold to present one or more relationships for searching across both of the first and second assets; said applying of a threshold comprising: determining a ratio of a score for at least one of the one or more relationship contexts by dividing the relationship context score by the relationship score; and comparing the ratio to the threshold. 8. The method according to claim 7 , wherein: the relationship score is determined with respect to structured entities; and the relationship context score is determined with respect to extracted entities. 9. The method according to claim 1 , wherein the relationship score comprises a Jaccard similarity or index score. 10. The method according to claim 1 , wherein the relationship score represents a fraction of at least one of the first and second data assets accounted for by the common entities. 11. The method according to claim 1 , wherein the one or more relationship contexts comprise a plurality of relationship contexts. 12. An apparatus identifying related data in different data sets to assist in searching the data sets, said apparatus comprising: at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising: computer readable program code configured to access a first structured data asset and a second unstructured data asset; computer readable program code configured to identify entities common to the first and second data assets, wherein to identify comprises extracting structured entities from the first structured data asset, tokenizing the second unstructured data asset, and performing a search using the tokenized second unstructured data asset with respect to the extracted structured entities; computer readable program code configured to determine a non-tangible score for a relationship between the first and second data assets, wherein the non-tangible score is based on the identified common entities and wherein the non-tangible score identifies a similarity between the first structured data asset and the second unstructured data asset; computer readable program code configured to determine one or more relationship contexts for the relationship between the first and second data assets; and computer readable program code configured to use the relationship score and one or more relationship contexts to join at least a portion of each of the first and second data assets as a basis for subsequent searching. 13. A computer program product for determining relationships between data assets, said computer program product comprising: computer readable program code configured to access a first structured data asset and a second unstructured data asset; computer readable program code configured to identify entities common to the first and second data assets, wherein to identify comprises extracting structured entities from the first structured data asset, tokenizing the second unstructured data asset, and performing a search using the tokenized second unstructured data asset with respect to the extracted structured entities; computer readable program code configured to determine a non-tangible score for a relationship between the first and second data assets, wherein the non-tangible score is based on the identified common entities and wherein the non-tangible score identifies a similarity between the first structured data asset and the second unstructured data asset; computer readable program code configured to determine one or more relationship contexts for the relationship between the first and second data assets; and computer readable program code configured to use the relationship score and one or more relationship contexts to join at least a portion of each of the first and second data assets as a basis for subsequent searching. 14. The computer program product according to claim 13 , wherein identifying entities comprises searching among columnar values in the first and second data assets. 15. The computer program product according to claim 13 , wherein the determining of a score comprises determining a bi-directional relationship score. 16. The computer program product according to claim 13 , wherein determining a score comprises determining a uni-directional relationship score. 17. The computer program product according to claim 13 , wherein determining one or more relationship contexts comprises determining a uni-directional relationship context score. 18. The computer program product according to claim 13 , comprising computer readable program code configured to determine a score for each of the one or more relationship contexts. 19. The computer program product according to claim 18 , wherein using the relationship score and one or more relationship contexts comprises: applying a threshold to present one or more relationships for searching across both of the first and second assets, via: determining a ratio of a score for at least one of the one or more relationship contexts by dividing the relationship context score by the relationship score; and comparing the ratio to the threshold. 20. A method comprising: accessing a first data asset and a second data asset; identifying a relationship between the first and second data assets, via identifying entities common to the first and second data assets; said identifying of common entities comprisin

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9740749B2 cover?
Methods and arrangements for identifying related data in different data sets to assist in searching the data sets. A first data asset and a second data asset are accessed. Common entities are identified between the first and second data assets. A score is determined for the relationship between the first and second data assets, based on the identified common entities. One or more relationship c…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/3053. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).