Systems and methods for natural language processing of structured documents

US2019005029A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019005029-A1
Application numberUS-201816021112-A
CountryUS
Kind codeA1
Filing dateJun 28, 2018
Priority dateJun 30, 2017
Publication dateJan 3, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for natural language processing of structured documents. In another embodiment, in an information processing apparatus comprising at least one computer processor, a method for processing a structured document may include: (1) receiving a document; (2) parsing the document into a plurality of components using a statistical parser; (3) extracting a plurality of entities from each component; (4) identifying a potential relationship between two of the plurality of entities; (5) generating a numeric representation for the potential relationship; (6) confirming the potential relationship with a logical regression model; and (7) generating and storing a unified structured file for the document.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for processing a structured document, comprising: in an information processing apparatus comprising at least one computer processor: receiving a document; parsing the document into a plurality of components using a statistical parser; extracting a plurality of entities from each component; identifying a potential relationship between two of the plurality of entities; generating a numeric representation for the potential relationship; confirming the potential relationship with a logical regression model; and generating and storing a unified structured file for the document. 2 . The method of claim 1 , wherein the statistical parser comprises a neural network. 3 . The method of claim 1 , wherein the plurality of components comprise at least one of a participating party, an article, a section, a subsection, and a subsubsection. 4 . The method of claim 1 , wherein the statistical parser parses the document based on a first vector of word embeddings and a second vector of orthographic properties of words in the document. 5 . The method of claim 1 , wherein the step of parsing the document into a plurality of components comprises identifying a relationship among the plurality of components. 6 . The method of claim 1 , further comprising filtering the document into a plurality of sections using a statistical section filter. 7 . The method of claim 1 , further comprising: generating a score for each sentence or paragraph of the document 8 . The method of claim 7 , wherein the score is generated using a latent semantic indexing model. 9 . The method of claim 7 , wherein the score is generated using a continuous bag-of-words model. 10 . The method of claim 1 , wherein the plurality of entities are extracted using a Conditional Random Field model. 11 . The method of claim 1 , wherein the potential relationship is based on an ontology. 12 . The method of claim 1 , wherein the potential relationship is based on a hierarchical correspondence rule. 13 . The method of claim 1 , wherein the numeric representation for the potential relationship is based on functional features, tail features, and head features of the potential relationship. 14 . The method of claim 1 , further comprising: identifying a plurality of defined terms in the document. 15 . The method of claim 1 , wherein the logical regression model confirms each potential relationship as being true or false. 16 . The method of claim 1 , further comprising: generating a graphical representation of the document. 17 . The method of claim 1 , wherein the document comprises a structured document. 18 . The method of claim 1 , further comprising: receiving feedback on an accuracy of at least one of the plurality of components identified using the statistical parser; and updating the statistical parser based on the feedback. 19 . The method of claim 1 , further comprising: receiving feedback on an accuracy of at least one of the entities extracted using the Conditional Random Field model; and updating the Conditional Random Field model based on the feedback. 20 . The method of claim 1 , further comprising: receiving feedback on an accuracy of at least one of the potential relationships confirmed using the logical regression model; and updating the logical regression model based on the feedback.

Assignees

Inventors

Classifications

  • Semantic analysis · CPC title

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Display of layout of documents; Previewing · CPC title

  • G06F40/205Primary

    Parsing · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019005029A1 cover?
Systems and methods for natural language processing of structured documents. In another embodiment, in an information processing apparatus comprising at least one computer processor, a method for processing a structured document may include: (1) receiving a document; (2) parsing the document into a plurality of components using a statistical parser; (3) extracting a plurality of entities from e…
Who is the assignee on this patent?
Jpmorgan Chase Bank Na
What technology area does this patent fall under?
Primary CPC classification G06F40/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).