User-defined automated document feature modeling, extraction and optimization

US11048762B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11048762-B2
Application numberUS-201916299128-A
CountryUS
Kind codeB2
Filing dateMar 11, 2019
Priority dateMar 16, 2018
Publication dateJun 29, 2021
Grant dateJun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided herein are systems and methods for user-defined automated document feature modeling, extraction and optimization. In the present disclosure, an end user of an automated document review system can customize and create new data models applicable to a set of focus documents. In addition, an end user of the automated document review system can customize and create new extraction rules applicable to text extraction from the set of focus documents. The user-defined edits to the data model and extraction rules can be further tested in a staging environment, and tested against a ground truth set of documents, before being widely applied to other relevant documents.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for optimizing automated document feature extraction, the method comprising: receiving a selection from a user of a data model to be updated or a new data model to be created; providing the user with a graphical user interface to edit one or more fields within the selected data model; receiving an edit from the user, via the graphical user interface, to the one or more fields within the selected data model; generating an updated data model based on the received edit from the user; assigning each of the data model and the updated data model a unique version; updating a staging environment for an automated document feature extraction system; performing automated document feature extraction utilizing the data model and the updated data model on a focus set of documents; displaying output of the automated document feature extraction for each of the data model and the updated data model; publishing the data model and the updated data model in a staging environment for testing and verification, to ensure that data model changes based on the received edit from the user are operating as intended; displaying, in the staging environment, the output of the automated document feature extraction from the data model and the updated data model so that a user can verify that the automated document feature extraction of the updated data model does not disrupt previous document field modeling and extraction using the data model, in an undesirable manner; and publishing the updated data model in the automated document feature extraction system to enhance automated document feature extraction within the focus set of documents and a second set of additional documents. 2. The method of claim 1 , further comprising: verifying that the automated document feature extraction utilizing the updated data model passes a quality check on a ground truth set of documents. 3. The method of claim 1 , wherein the received edit from the user to the one or more fields within the selected data model is a creation of a new field within the selected data model. 4. The method of claim 1 , wherein the received edit from the user to the one or more fields within the selected data model is a modification or deletion of an existing field within the selected data model. 5. The method of claim 1 , wherein the received edit from the user to the one or more fields within the selected data model comprises reordering of at least one field within a group. 6. The method of claim 1 , wherein the received edit from the user to the one or more fields within the selected data model comprises adding or removing at least one field from a group. 7. A computer-implemented method for optimizing automated document feature extraction, the method comprising: receiving a selection from a user of a data model to be updated or a new data model to be created; providing the user with a graphical user interface to edit one or more extraction rules within the selected data model; receiving an edit from the user, via the graphical user interface, to the one or more extraction rules within the selected data model; generating an updated data model based on the received edit from the user; assigning each of the data model and the updated data model a unique version; updating a staging environment for an automated document feature extraction system; performing automated document feature extraction utilizing the data model and the updated data model on a focus set of documents; displaying output of the automated document feature extraction for each of the data model and the updated data model; publishing the data model and the updated data model in a staging environment for testing and verification, to ensure that data model changes based on the received edit from the user are operating as intended; displaying, in the staging environment, the output of the automated document feature extraction from the data model and the updated data model so that a user can verify that the automated document feature extraction of the updated data model does not disrupt previous document field modeling and extraction using the data model, in an undesirable manner; publishing the updated data model in the automated document feature extraction system to enhance automated document feature extraction within the focus set of documents and a second set of additional documents. 8. The method of claim 7 , further comprising: verifying that the automated document feature extraction utilizing the updated data model passes a quality check on a ground truth set of documents. 9. The method of claim 7 , wherein the received edit from the user to the one or more extraction rules is a reordering of at least two extraction rules. 10. The method of claim 7 , wherein the received edit from the user to the one or more extraction rules is creation of a new extraction rule. 11. The method of claim 7 , wherein the received edit from the user to the one or more extraction rules is modification or deletion of one or more existing extraction rules. 12. The method of claim 7 , further comprising: displaying to the user, via the graphical user interface, an amount of substitutions, deletions, and insertions made by the automated document feature extraction system as a result of the updated data model. 13. A system, comprising: a processor; and a memory for storing executable instructions, the processor executing the instructions to: receive a selection from a user of a data model to be updated or a new data model to be created; provide the user with a graphical user interface to edit one or more fields within the selected data model; receive an edit from the user, via the graphical user interface, to the one or more fields within the selected data model; generate an updated data model based on the received edit from the user; assign each of the data model and the updated data model a unique version; update a staging environment for an automated document feature extraction system; perform automated document feature extraction utilizing the data model and the updated data model on a focus set of documents; publish the data model and the updated data model in a staging environment for testing and verification, to ensure that data model changes based on the received edit from the user are operating as intended; provide, in the staging environment, the output of the automated document feature extraction from the data model and the updated data model so that a user can verify that the automated document feature extraction of the updated data model does not disrupt previous document field modeling and extraction using the data model, in an undesirable manner; publish the updated data model in the automated document feature extraction system to enhance automated document feature extraction within the focus set of documents and a second set of additional documents. 14. The system of claim 13 , wherein the processor further executes the instructions to: verify that the automated document feature extraction utilizing the updated data model passes a quality check on a ground truth set of documents. 15. The system of claim 13 , wherein the received edit from the user to the one or more fields within the selected data model is a creation of a new field within the selected data model. 16. The system of claim 13 , wherein the received edit from the user to the one or more fields within the selected data model is a modification or deletion of an existing field within the selected data model. 17. The system of c

Assignees

Inventors

Classifications

  • G06V10/993Primary

    Evaluation of the quality of the acquired pattern · CPC title

  • Document-oriented image-based pattern recognition · CPC title

  • G06F16/93Primary

    Document management systems · CPC title

  • Editing, e.g. inserting or deleting · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11048762B2 cover?
Provided herein are systems and methods for user-defined automated document feature modeling, extraction and optimization. In the present disclosure, an end user of an automated document review system can customize and create new data models applicable to a set of focus documents. In addition, an end user of the automated document review system can customize and create new extraction rules appl…
Who is the assignee on this patent?
Open Text Holdings Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/993. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).