Deep symbolic validation of information extraction systems

US2020218968A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020218968-A1
Application numberUS-201916241569-A
CountryUS
Kind codeA1
Filing dateJan 7, 2019
Priority dateJan 7, 2019
Publication dateJul 9, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system, comprising: a memory that stores computer-executable components; a processor, operably coupled to the memory, that executes the computer-executable components stored in the memory, wherein the computer-executable components comprise: a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of entities and relation by training from output of the relation extraction component. 2 . The system of claim 1 , wherein the relation extraction component generates a set of quads from the corpus of data, wherein the quads have form q=<e1, r; e2; s> where eiϵV are entities found in the corpus of data, rϵR is a finite set of relations and sϵ[0, 1]. 3 . The system of claim 2 , further comprising a perception component that implements function RelEx(e1; e2, Ø(e1; e2)) for each relation rϵR, returns a set of quads assessing their confidence from analysis of textual evidence. 4 . The system of claim 3 , wherein the textual evidence is: RelEx(e1, e2, Ø(e1, e2))=<e1, ri, e, si>riϵR, where si is a confidence score for relation ri. 5 . The system of claim 3 , further comprising a validation component that returns a confidence score for any possible triple such that e1, e2ϵV and rϵR. 6 . The system of claim 1 , wherein a mathematical loss function is implemented to account for confidence associated with triples 7 . The system of claim 1 , wherein the training is dependent upon noisy output of relation extraction. 8 . The system of claim 2 , wherein the loss function is dependent upon a specific mathematical function defines as: ℒ = - 1  O ′   ∑ i ∈ O ′  ∑ i = 1  ɛ   q i h , r  log   v i h , r . 9 . The system of claim 1 , wherein the relation triples can identify threats in cybersecurity. 10 . The system of claim 3 , wherein the validation is implemented by using a deep net where a loss function is modified to account for fuzzy truth values provided by output of the perception component. 11 . A computer-implemented method, comprising: receiving, by a processor operatively coupled to a memory, a corpus of data; generating via relation extraction, by the processor, noisy knowledge graphs from the corpus of data; and acquiring, by the processor, global representations of entities and relation by training from output of the relation extraction. 12 . The method of claim 11 , wherein the relation extraction generates a set of quads from the corpus of data, wherein the quads have form q=<e1, r; e2; s> where eiϵV are entities found in the corpus of data, rϵR is a finite set of relations and sϵ[0, 1]. 13 . The method of claim 12 , further comprising performing a perception act that implements function RelEx(e1; e2, Ø(e1; e2)) for each relation rϵR, and returns a set of quads assessing their confidence from analysis of textual evidence. 14 . The method of claim 13 , wherein the textual evidence is: RelEx(e1, e2, Ø(e1, e2))=<e1, ri, e, si>riϵR, where si is a confidence score for relation ri. 15 . The method of claim 13 , further comprising performing a validation act that returns a confidence score for any possible triple such that e1, e2ϵV and rϵR. 16 . The method of claim 11 , wherein a mathematical loss function is implemented to account for confidence associated with triples 17 . The method of claim 1 , wherein the training is dependent upon noisy output of the relation extraction. 18 . The method of claim 12 , wherein the loss function is dependent upon a specific mathematical function defines as: ℒ = - 1  O ′    ?   ∑ i = 1  ɛ   q i h , r  log   v i h , r . 

Assignees

Inventors

Classifications

  • based on fuzzy logic, fuzzy membership or fuzzy inference, e.g. adaptive neuro-fuzzy inference systems [ANFIS] · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Feedforward networks · CPC title

  • Knowledge-based neural networks; Logical representations of neural networks · CPC title

  • involving event detection and direct action · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020218968A1 cover?
A system comprises a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components. The system includes a receiving component that receives a corpus of data; a relation extraction component that generates noisy knowledge graphs from the corpus; and a training component that acquires global representations of …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N5/022. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 09 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).