Extracting Facts from Unstructured Information
US-2019286999-A1 · Sep 19, 2019 · US
US2019005029A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019005029-A1 |
| Application number | US-201816021112-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 28, 2018 |
| Priority date | Jun 30, 2017 |
| Publication date | Jan 3, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for natural language processing of structured documents. In another embodiment, in an information processing apparatus comprising at least one computer processor, a method for processing a structured document may include: (1) receiving a document; (2) parsing the document into a plurality of components using a statistical parser; (3) extracting a plurality of entities from each component; (4) identifying a potential relationship between two of the plurality of entities; (5) generating a numeric representation for the potential relationship; (6) confirming the potential relationship with a logical regression model; and (7) generating and storing a unified structured file for the document.
Opening claim text (preview).
What is claimed is: 1 . A method for processing a structured document, comprising: in an information processing apparatus comprising at least one computer processor: receiving a document; parsing the document into a plurality of components using a statistical parser; extracting a plurality of entities from each component; identifying a potential relationship between two of the plurality of entities; generating a numeric representation for the potential relationship; confirming the potential relationship with a logical regression model; and generating and storing a unified structured file for the document. 2 . The method of claim 1 , wherein the statistical parser comprises a neural network. 3 . The method of claim 1 , wherein the plurality of components comprise at least one of a participating party, an article, a section, a subsection, and a subsubsection. 4 . The method of claim 1 , wherein the statistical parser parses the document based on a first vector of word embeddings and a second vector of orthographic properties of words in the document. 5 . The method of claim 1 , wherein the step of parsing the document into a plurality of components comprises identifying a relationship among the plurality of components. 6 . The method of claim 1 , further comprising filtering the document into a plurality of sections using a statistical section filter. 7 . The method of claim 1 , further comprising: generating a score for each sentence or paragraph of the document 8 . The method of claim 7 , wherein the score is generated using a latent semantic indexing model. 9 . The method of claim 7 , wherein the score is generated using a continuous bag-of-words model. 10 . The method of claim 1 , wherein the plurality of entities are extracted using a Conditional Random Field model. 11 . The method of claim 1 , wherein the potential relationship is based on an ontology. 12 . The method of claim 1 , wherein the potential relationship is based on a hierarchical correspondence rule. 13 . The method of claim 1 , wherein the numeric representation for the potential relationship is based on functional features, tail features, and head features of the potential relationship. 14 . The method of claim 1 , further comprising: identifying a plurality of defined terms in the document. 15 . The method of claim 1 , wherein the logical regression model confirms each potential relationship as being true or false. 16 . The method of claim 1 , further comprising: generating a graphical representation of the document. 17 . The method of claim 1 , wherein the document comprises a structured document. 18 . The method of claim 1 , further comprising: receiving feedback on an accuracy of at least one of the plurality of components identified using the statistical parser; and updating the statistical parser based on the feedback. 19 . The method of claim 1 , further comprising: receiving feedback on an accuracy of at least one of the entities extracted using the Conditional Random Field model; and updating the Conditional Random Field model based on the feedback. 20 . The method of claim 1 , further comprising: receiving feedback on an accuracy of at least one of the potential relationships confirmed using the logical regression model; and updating the logical regression model based on the feedback.
Semantic analysis · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Display of layout of documents; Previewing · CPC title
Parsing · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.