Relation extraction using co-training with distant supervision

US10229195B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10229195-B2
Application numberUS-201715629891-A
CountryUS
Kind codeB2
Filing dateJun 22, 2017
Priority dateJun 22, 2017
Publication dateMar 12, 2019
Grant dateMar 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generating, updating, and using a knowledge graph. Concepts in a knowledge graph can have relations to one another. These relations may be expressed as confidence values. A training data set may be split into two portions, with the first portion used to update confidence values for existing relations between concept pairs, using the knowledge graph. These confidence values can be used, together with the second portion used to update confidence values for known phrases that express known relations. These confidence values, in turn, can be used, together with the first portion, to increase the accuracy of the original confidence scores with respect to existing relations. The process may be iteratively employed, with each iteration increasing the accuracy of confidence scores.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for updating an electronically stored knowledge graph of a knowledge domain, comprising: receiving a natural language text comprising sentences; splitting the sentences into at least a first portion and a second portion; identifying concepts in the sentences of the first portion; determining, for a first sentence corresponding to at least one sentence in the first portion having a concept pair, a likelihood that the first sentence exhibits an existing relation between concepts of the concept pair, wherein the determined likelihood represents a first confidence value; and determining, for a second sentence corresponding to at least one sentence in the second portion having the concept pair, a likelihood that a word or phrase connecting concepts of the concept pair exhibits the existing relation, wherein the determined likelihood represents a second confidence value and is based, in part, on the first confidence value; determining, for a third sentence corresponding to at least one sentence in the first portion having the concept pair, a likelihood that the third sentence exhibits the existing relation, wherein the determined likelihood represents a third confidence value, and is based, in part, on the second confidence value; iteratively determining successive likelihoods according to the first, second, and third confidence values, by alternating between sentences of the first portion and sentences of the second portion until the successive likelihoods reach corresponding threshold confidence values; updating the reference knowledge graph to include relations between concepts whose corresponding confidence scores exceed a threshold value; and using the updated reference knowledge graph in an analysis of additional natural language text. 2. The method of claim 1 , further comprising labeling the sentences, the labeling comprising: identifying concepts in the sentences text; annotating the sentences with the concepts; and extracting from the natural language text, lexical and syntactic features. 3. The method of claim 1 , wherein determining, for at least one sentence in the first portion having a concept pair, a likelihood that the sentence is an instance of the concepts in the concept pair exhibiting an existing relation, comprises: identifying as the existing relation, an existing relation between respective categories of the concepts in the concept pair as defined by a reference knowledge graph. 4. The method of claim 1 , wherein determining, for at least one sentence in the first portion having a concept pair, a likelihood that the sentence is an instance of the concepts in the concept pair exhibiting an existing relation, is based on identifying one or more concept pairs in the at least one sentence, the identifying comprising: comparing words of the at least one sentence in the first portion to elements of a knowledge graph; identifying matching words as concepts; and pairing at least two of the identifying concepts with one another to form a concept pair. 5. The method of claim 1 , wherein determining, for at least one sentence in the first portion having a concept pair, a likelihood that the sentence is an instance of the concepts in the concept pair exhibiting an existing relation, comprises: identifying as the existing relation, an existing relation between the concepts in the concept pair as defined by a reference knowledge graph. 6. The method of claim 1 , further comprising: updating a reference knowledge graph to include relations between concepts whose corresponding confidence scores exceed a threshold value. 7. A computer program product, comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising: receiving, by the processor, a natural language text comprising sentences; splitting, by the processor, the sentences into at least a first portion and a second portion; identifying, by the processor, concepts in the sentences of the first portion; determining, by the processor, for a first sentence corresponding to at least one sentence in the first portion having a concept pair, a likelihood that the first sentence exhibits an existing relation between concepts of the concept pair, wherein the determined likelihood represents a first confidence value; and determining, by the processor, for a second sentence corresponding to at least one sentence in the second portion having the concept pair, a likelihood that a word or phrase connecting concepts of the concept pair exhibits the existing relation, wherein the determined likelihood represents a second confidence value and is based, in part, on the first confidence value; determining, for a third sentence corresponding to at least one sentence in the first portion having the concept pair, a likelihood that the third sentence exhibits the existing relation, wherein the determined likelihood represents a third confidence value, and is based, in part, on the second confidence value; iteratively determining successive likelihoods according to the first, second, and third confidence values, by alternating between sentences of the first portion and sentences of the second portion until the successive likelihoods reach corresponding threshold confidence values; updating the reference knowledge graph to include relations between concepts whose corresponding confidence scores exceed a threshold value; and using the updated reference knowledge graph in an analysis of additional natural language text. 8. The computer program product of claim 7 , further comprising labeling the sentences, the labeling comprising: identifying concepts in the sentences text; annotating the sentences with the concepts; and extracting from the natural language text, lexical and syntactic features. 9. The computer program product of claim 7 , wherein determining, for at least one sentence in the first portion having a concept pair, a likelihood that the sentence is an instance of the concepts in the concept pair exhibiting an existing relation, comprises: identifying, by the processor, as the existing relation, an existing relation between respective categories of the concepts in the concept pair as defined by a reference knowledge graph. 10. The computer program product of claim 7 , wherein determining, for at least one sentence in the first portion having a concept pair, a likelihood that the sentence is an instance of the concepts in the concept pair exhibiting an existing relation, is based on identifying one or more concept pairs in the at least one sentence, the identifying comprising: comparing, by the processor, words of the at least one sentence in the first portion to elements of a knowledge graph; identifying, by the processor, matching words as concepts; and pairing, by the processor, at least two of the identifying concepts with one another to form a concept pair. 11. The computer program product of claim 7 , wherein determining, for at least one sentence in the first portion having a concept pair, a likelihood that the sentence is an instance of the concepts in the concept pair exhibiting an existing relation, comprises: identifying, by the processor, as the existing relation, an existing relation between the concepts in the concept pair as defined by a reference knowledge graph. 12. The computer program of claim 7 , further comprising: updating, by the processor, a reference knowledge graph to include relations between concepts whose corresponding confidence scores exceed a threshold value.

Assignees

Inventors

Classifications

  • Thesauruses; Synonyms · CPC title

  • Selection or weighting of terms for indexing · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title

  • Digital computing or data processing equipment or methods, specially adapted for specific functions (information retrieval, database structures or file system structures therefor G06F16/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10229195B2 cover?
Generating, updating, and using a knowledge graph. Concepts in a knowledge graph can have relations to one another. These relations may be expressed as confidence values. A training data set may be split into two portions, with the first portion used to update confidence values for existing relations between concept pairs, using the knowledge graph. These confidence values can be used, together…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).