Supplementing structured information about entities with information from unstructured data sources

US9817888B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9817888-B2
Application numberUS-201514955146-A
CountryUS
Kind codeB2
Filing dateDec 1, 2015
Priority dateMay 29, 2012
Publication dateNov 14, 2017
Grant dateNov 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for supplementing structured information within a data system for entities based on unstructured data analyzes a document with unstructured data and extracts attribute values from the unstructured data for one or more entities of the data system. Entity records with structured information are retrieved from the data system based on the extracted attribute values. Entity references for corresponding entities of the data system are constructed based on a comparison of the retrieved entity records and the extracted attribute values. The entity references are linked to the corresponding entities within the data system, with the entity references including extracted attributes from the unstructured data for corresponding linked entities.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of supplementing structured information within a data system for entities based on unstructured data comprising: analyzing one or more documents with unstructured data pertaining to the entities of the structured information and identifying from the unstructured data one or more relationships between the entities of the structured information; extracting attribute values from the unstructured data for one or more of the entities of the structured information, wherein extracting attribute values from the unstructured data includes: extracting attribute values based on a comparison of the unstructured data with one or more dictionaries each including values for a corresponding attribute of an entity within the data system; retrieving entity records with structured information from the data system based on the extracted attribute values; constructing entity references for corresponding one or more of the entities of the data system based on a comparison of the retrieved entity records and the extracted attribute values; linking the entity references to the corresponding one or more entities within the data system, wherein the entity references include extracted attributes from the unstructured data for corresponding linked entities, and wherein linking the entity references includes: inserting the entity references into one of the data system and an external data source based on a comparison of matching scores for the entity references with corresponding thresholds; defining new relationships in the data system between the entities within the structured information corresponding to the one or more relationships between the entities identified from the unstructured data; creating the new defined relationships between the entities in the data system by linking the entities of the structured information to each other within the structured information based on the one or more relationships between those entities identified within the unstructured data; and processing a query for the data system to retrieve at least one entity of the structured information including the entities linked to the at least one entity based on the defined new relationships and the corresponding unstructured data of the linked entity references. 2. The computer-implemented method of claim 1 , wherein the data system includes a master data management system and the one or more documents are received from a content management system. 3. The computer-implemented method of claim 1 , wherein extracting attribute values from the unstructured data further includes: extracting the attribute values from the unstructured data based on an attribute value within the unstructured data and a dictionary value including a common portion of an attribute value and being within a certain distance. 4. The computer-implemented method of claim 1 , wherein an attribute of an entity includes a plurality of atomic attributes, and extracting attribute values from the unstructured data includes: extracting attribute values for each of the individual atomic attributes from the unstructured data. 5. The computer-implemented method of claim 1 , wherein constructing entity references includes: constructing entity references for the corresponding one or more entities of the data system based on a fuzzy match of the retrieved entity records and the extracted attribute values. 6. The computer-implemented method of claim 1 , wherein linking the entity references includes: inserting the entity references into the data system and merging the entity references with records of the corresponding one or more entities within the data system. 7. A system for supplementing structured information within a data system for entities based on unstructured data comprising: at least one processor configured to: analyze one or more documents with unstructured data pertaining to the entities of the structured information and identify from the unstructured data one or more relationships between the entities of the structured information; extract attribute values from the unstructured data for one or more of the entities of the structured information, wherein extracting attribute values from the unstructured data includes: extracting attribute values based on a comparison of the unstructured data with one or more dictionaries each including values for a corresponding attribute of an entity within the data system; retrieve entity records with structured information from the data system based on the extracted attribute values; construct entity references for corresponding one or more of the entities of the data system based on a comparison of the retrieved entity records and the extracted attribute values; link the entity references to the corresponding one or more entities within the data system, wherein the entity references include extracted attributes from the unstructured data for corresponding linked entities, and wherein linking the entity references includes: inserting the entity references into one of the data system and an external data source based on a comparison of matching scores for the entity references with corresponding thresholds; define new relationships in the data system between the entities within the structured information corresponding to the one or more relationships between the entities identified from the unstructured data; create the new defined relationships between the entities in the data system by linking the entities of the structured information to each other within the structured information based on the one or more relationships between those entities identified within the unstructured data; and process a query for the data system to retrieve at least one entity of the structured information including the entities linked to the at least one entity based on the defined new relationships and the corresponding unstructured data of the linked entity references. 8. The system of claim 7 , wherein the data system includes a master data management system and the one or more documents are received from a content management system. 9. The system of claim 7 , wherein extracting attribute values from the unstructured data further includes: extracting the attribute values from the unstructured data based on an attribute value within the unstructured data and a dictionary value including a common portion of an attribute value and being within a certain distance. 10. The system of claim 7 , wherein an attribute of an entity includes a plurality of atomic attributes, and extracting attribute values from the unstructured data includes: extracting attribute values for each of the individual atomic attributes from the unstructured data. 11. The system of claim 7 , wherein constructing entity references includes: constructing entity references for the corresponding one or more entities of the data system based on a fuzzy match of the retrieved entity records and the extracted attribute values. 12. The system of claim 7 , wherein linking the entity references includes: inserting the entity references into the data system and merging the entity references with records of the corresponding one or more entities within the data system. 13. A computer program product for supplementing structured information within a data system for entities based on unstructured data comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to: analyze one or more documents with unstructured data pertaining to

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9817888B2 cover?
A method for supplementing structured information within a data system for entities based on unstructured data analyzes a document with unstructured data and extracts attribute values from the unstructured data for one or more entities of the data system. Entity records with structured information are retrieved from the data system based on the extracted attribute values. Entity references for …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30634. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).