Machine learning-based relationship association and related discovery and search engines

US2018082183A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018082183-A1
Application numberUS-201715609800-A
CountryUS
Kind codeA1
Filing dateMay 31, 2017
Priority dateFeb 22, 2011
Publication dateMar 22, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and techniques for determining relationships and association significance between entities are disclosed. The systems and techniques automatically identify supply chain relationships between companies based on unstructured text corpora. The system combines Machine Learning models to identify sentences mentioning supply chain between two companies (evidence), and an aggregation layer to take into account the evidence found and assign a confidence score to the relationship between companies.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for providing remote users over a communication network supply-chain relationship data via a centralized Knowledge Graph user interface, the system comprising: a Knowledge Graph data store comprising a plurality of Knowledge Graphs, each Knowledge Graph related to an associated entity, and including a first Knowledge Graph associated with a first company and comprising supplier-customer data; an input adapted to receive electronic documents from a plurality of data sources via a communications network, the received electronic documents including unstructured text; a pre-processing interface adapted to perform one or more of named entity recognition, relation extraction, and entity linking on the received electronic documents and generate a set of tagged data, and further adapted to parse the electronic documents into sentences and identify a set of sentences with each identified sentence having at least two identified companies as an entity-pair; a pattern matching module adapted to perform a pattern-matching set of rules to extract sentences from the set of sentences as supply chain evidence candidate sentences; a classifier adapted to utilize natural language processing on the supply chain evidence candidate sentences and calculate a probability of a supply-chain relationship between an entity-pair associated with the supply chain evidence candidate sentences; and an aggregator adapted to aggregate at least some of the supply chain evidence candidates based on the calculated probability to arrive at an aggregate evidence score for a given entity-pair, wherein a Knowledge Graph associated with at least one company from the entity-pair is generated or updated based at least in part on the aggregate evidence score. 2 . The system of claim 1 further comprising a user interface adapted to receive an input signal from a remote user-operated device, the input signal representing a user query, wherein an output is generated for delivery to the remote user-operated device and related to a Knowledge Graph associated with a company in response to the user query. 3 . The system of claim 1 further comprising a training module adapted to derive at least in part one or both of the pattern matching module and classifier module based on evaluation of a set of training documents. 4 . The system of claim 1 further comprising a graph-based data model for describing entities and relationships as a set of triples comprising a subject, predicate and object and stored in a triple store. 5 . The system of claim 4 wherein the graph-based data model is a Resource Description Framework (RDF) model. 6 . The system of claim 4 wherein the triples are queried using SPARQL query language. 7 . The system of claim 4 further comprising a fourth element added to the set of triples to result in a quad. 8 . The system of claim 1 further comprising a machine learning-based algorithm adapted to detect relationships between entities in an unstructured text document. 9 . The system of claim 1 wherein the classifier predicts a probability of a relationship based on an extracted set of features from a sentence. 10 . The system of claim 9 wherein the extracted set of features includes context-based features comprising one or more of n-grams and patterns. 11 . The system of claim 1 , wherein updating the Knowledge Graph is based on the aggregate evidence score satisfying a threshold value. 12 . The system of claim 1 wherein the pre-processing interface is further adapted to compute significance between entities by: identifying a first entity and a second entity from a plurality of entities, the first entity having a first association with the second entity, and the second entity having a second association with the first entity; weighting a plurality of criteria values assigned to the first association, the plurality of criteria values based on a plurality of association criteria selected from the group consisting essentially of interestingness, recent interestingness, validation, shared neighbor, temporal significance, context consistency, recent activity, current clusters, and surprise element; and computing a significance score for the first entity with respect to the second entity based on a sum of the plurality of weighted criteria values for the first association, the significance score indicating a level of significance of the second entity to the first entity. 13 . A method for providing remote users over a communication network supply-chain relationship data via a centralized Knowledge Graph user interface, the method comprising: storing at a Knowledge Graph data store a plurality of Knowledge Graphs, each Knowledge Graph related to an associated entity, and including a first Knowledge Graph associated with a first company and comprising supplier-customer data; receiving, by an input, electronic documents from a plurality of data sources via a communications network, the received electronic documents including unstructured text; performing, by a pre-processing interface, one or more of named entity recognition, relation extraction, and entity linking on the received electronic documents and generate a set of tagged data, parsing the electronic documents into sentences, and identifying a set of sentences with each identified sentence having at least two identified companies as an entity-pair; performing, by a pattern matching module, a pattern-matching set of rules to extract sentences from the set of sentences as supply chain evidence candidate sentences; utilizing, by a classifier, natural language processing on the supply chain evidence candidate sentences and calculating a probability of a supply-chain relationship between an entity-pair associated with the supply chain evidence candidate sentences; and aggregating, by an aggregator, at least some of the supply chain evidence candidates based on the calculated probability to arrive at an aggregate evidence score for a given entity-pair, wherein a Knowledge Graph associated with at least one company from the entity-pair is generated or updated based at least in part on the aggregate evidence score. 14 . The method of claim 13 further comprising: receiving, by a user interface, an input signal from a remote user-operated device, the input signal representing a user query, wherein an output is generated for delivery to the remote user-operated device and related to a Knowledge Graph associated with a company in response to the user query. 15 . The method of claim 13 further comprising describing, by a graph-based data model, entities and relationships as a set of triples comprising a subject, predicate and object and stored in a triple store. 16 . The method of claim 13 further comprising detecting, by a machine learning-based algorithm, relationships between entities in an unstructured text document. 17 . The method of claim 13 wherein the predicting, by the classifier, a probability of a relationship is based on an extracted set of features from a sentence. 18 . The method of claim 13 , wherein updating the Knowledge Graph is based on the aggregate evidence score satisfying a threshold value. 19 . The method of claim 13 further comprising: identifying, by the pre-processing interface, a first entity and a second entity from a plurality of entities, the first entity having a first association with the second entity, and the second entity having a second association with the first entity; weighting, by the pre-processing interf

Assignees

Inventors

Classifications

  • characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title

  • Physics · mapped topic

  • G06N5/00Primary

    Computing arrangements using knowledge-based models · CPC title

  • Physics · mapped topic

  • Knowledge representation; Symbolic representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018082183A1 cover?
Systems and techniques for determining relationships and association significance between entities are disclosed. The systems and techniques automatically identify supply chain relationships between companies based on unstructured text corpora. The system combines Machine Learning models to identify sentences mentioning supply chain between two companies (evidence), and an aggregation layer to …
Who is the assignee on this patent?
Thomson Reuters Global Resources
What technology area does this patent fall under?
Primary CPC classification G06N5/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).