Systems and methods for facilitating data object extraction from unstructured documents

US11244102B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11244102-B2
Application numberUS-202016799788-A
CountryUS
Kind codeB2
Filing dateFeb 24, 2020
Priority dateApr 6, 2017
Publication dateFeb 8, 2022
Grant dateFeb 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for facilitating data object extraction from unstructured documents. Unstructured documents may include data in an unorganized format, such as raw text. The system may use natural language processing to determine characteristics of the terms used in the unstructured document. The system may prompt a user to select terms from the document corresponding in characteristics to properties of a data object being generated. The user may select terms from the document and the system may generate a data object according to the selected terms.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for extracting object data from an unstructured document: one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the system to: store an unstructured document in a first portion of an electronic storage; store tagging templates in a second portion of the electronic storage; retrieve the unstructured document from the first portion, the unstructured document comprising a plurality of terms, each term including at least one word, wherein the unstructured document is converted to a particular ontology; assign a term classification to the plurality of terms of the unstructured document via natural language processing; obtain a tagging template, of the tagging templates, from the second portion, for the unstructured document, the tagging template comprising tagging elements, each tagging element having an element classification, the tagging template being obtained based on the particular ontology and a description of the unstructured document, the plurality of tagging elements of the tagging template corresponding to one or more properties of the particular ontology; receive, from a user via an interface, a plurality of selected terms corresponding to the plurality of tagging elements, wherein the term classifications of the selected terms matches the element classification of the corresponding tagging elements; and generate, from the unstructured document based on the plurality of selected terms corresponding to the plurality of tagging elements, a data object organized according to the particular ontology to extract organized object-based data from the unstructured document, the generating of the data object comprising converting the subset and the tagging elements to properties of the data object; store the data object in a third portion of the electronic storage; map the data object to the unstructured data in the first portion to reveal a source of the data object; ingest the data object into an object-based data analysis platform; and analyze the data object using an object-based data analysis platform. 2. The system of claim 1 , wherein the system is further caused to: determine a selected tagging element from the plurality of tagging elements; identify, to the user, suggested terms from the plurality of selected terms having a term classification matching the element classification of the selected tagging element. 3. The system of claim 1 , wherein the system is further caused to create the tagging template by generating the plurality of tagging elements of the tagging template according to a required data object structure and the generated data object conforms to the required data object structure. 4. The system of claim 3 , wherein the system is further caused to receive, from a user, a narrative structure indicative of at least one relationship between the plurality of tagging elements of the tagging template. 5. The system of claim 4 , wherein the system is further caused to: determine a mapping between the narrative structure and the tagging elements; and provide, to the user via the interface, a visual display of the plurality of tagging elements and the narrative structure, according to the mapping, indicating the relationship between the plurality of tagging elements. 6. The system of claim 1 , wherein the system is further caused to: determine a selected tagging element from the plurality of tagging elements; and provide, to the user, a prompt providing information about at least one characteristic of the selected tagging element. 7. The system of claim 1 , wherein the system is further caused to: determine a first selected tagging element from the plurality of tagging elements; identify the first selected tagging element to the user; receive a first selected term corresponding to the first selected tagging element according to user input; determine a second selected tagging element from the plurality of tagging elements; identify the second selected tagging element to the user; receive a second selected term corresponding to the second selected tagging element according to user input, wherein the plurality of selected terms includes at least the first selected term and the second selected term. 8. The system of claim 1 , wherein the system is further caused to: receive, from a user, a selection of a second data object; and store a data link between the generated data object and the selected second data object. 9. The system of claim 1 , wherein the system is further caused to: receive, from a second user, a second plurality of selected terms corresponding to the plurality of tagging elements, wherein the term classifications of the selected terms matches the element classification of the corresponding tagging elements; receive, from a third user, a third plurality of selected terms corresponding to the plurality of tagging elements, wherein the term classifications of the selected terms matches the element classification of the corresponding tagging elements; and wherein to generate the data object from the unstructured document the system is caused to generate the data object according to the plurality of selected terms, the second plurality of selected terms, and the third plurality of selected terms. 10. The system of claim 1 , wherein to assign the term classification to the plurality of terms, the system is further caused to assign to each of the plurality of terms at least one of a part of speech and a grammatical role. 11. The system of claim 1 , wherein the data object comprises media components corresponding to the tagging elements. 12. A method for extracting object data from an unstructured document, the method being performed on a computer system having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, cause the computer system to perform the method, the method comprising: storing an unstructured document in a first portion of an electronic storage; storing tagging templates in a second portion of the electronic storage; retrieving, by the computer system, the unstructured document comprising a plurality of terms, each term including at least one word, wherein the unstructured document is converted to a particular ontology; assigning, by the computer system, a term classification to the plurality of terms of the unstructured document via natural language processing; obtaining, by the computer system, a tagging template, of the tagging templates, from the second portion, for the unstructured document, the tagging template comprising tagging elements, each tagging element having an element classification, the tagging template being obtained based on the particular ontology and a description of the unstructured document, the plurality of tagging elements of the tagging template corresponding to one or more properties of the particular ontology; receiving, by the computer system, from a user via an interface, a plurality of selected terms corresponding to the plurality of tagging elements, wherein the term classifications of the selected terms matches the element classification of the corresponding tagging elements; and generating, by the computer system from the unstructured document based on the plurality of selected terms corresponding to the plurality of tagging elements, a data object organized according to the particular ontology to extract organized object-based data from the unstructured document, the generating of the data object comprising converting the subset and the tagging elements to properties of the data object; st

Assignees

Inventors

Classifications

  • Named entity recognition · CPC title

  • G06F40/279Primary

    Recognition of textual entities · CPC title

  • Annotation, e.g. comment data or footnotes · CPC title

  • G06F40/117Primary

    Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11244102B2 cover?
Systems and methods are provided for facilitating data object extraction from unstructured documents. Unstructured documents may include data in an unorganized format, such as raw text. The system may use natural language processing to determine characteristics of the terms used in the unstructured document. The system may prompt a user to select terms from the document corresponding in charact…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).