Automatic rule prediction and generation for document classification and validation

US2022398397A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022398397-A1
Application numberUS-202117303914-A
CountryUS
Kind codeA1
Filing dateJun 10, 2021
Priority dateJun 10, 2021
Publication dateDec 15, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is provided. The method may include, in response to electronically receiving a document, automatically classifying the document and different parts of the document, by electronically identifying a document type associated with the document and electronically tagging data associated with the different parts of the document based on classification rules. The method may further include automatically extracting the tagged data associated with the automatically classified document based on data extraction rules. The method may further include detecting first feedback associated with the classification rules and second feedback associated with the data extraction rules. The method may further include automatically generating and updating validation rules based on the identified document type, the detected first feedback, and the detected second feedback to validate the automatically classified document and the automatically tagged and extracted data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: in response to electronically receiving a document, automatically classifying the document and different parts of the document, wherein automatically classifying the document comprises electronically identifying a document type associated with the document and electronically tagging data associated with the different parts of the document based on one or more classification rules pertaining to the identified document type and identified data type in the document; automatically extracting the tagged data associated with the automatically classified document based on one or more data extraction rules associated with the identified document type and the identified data type; detecting first feedback associated with the one or more classification rules and second feedback associated with the one or more data extraction rules; and automatically generating and updating validation rules based on the identified document type, the detected first feedback, and the detected second feedback to validate the automatically classified document and the automatically tagged and extracted data. 2 . The method of claim 1 , wherein automatically classifying the document and the different parts of the document further comprises: generating one or more classification rule suggestions based the one or more classification rules; and applying at least one classification rule suggestion from the generated one or more classification rules suggestions based on a percentage threshold. 3 . The method of claim 1 , wherein automatically extracting the tagged data associated with the automatically classified document further comprises: generating one or more data extraction rule suggestions based the one or more data extraction rules; and applying at least one data extraction rule suggestion from the generated one or more data extraction rule suggestions based on a percentage threshold. 4 . The method of claim 1 , wherein detecting the first feedback associated with the one or more classification rules comprises detecting via a user interface at least one of a selection of a classification rule, an acceptance of a classification rule suggestion, a rejection of the classification rule suggestion, and a edit of the classification rule suggestion. 5 . The method of claim 1 , wherein detecting the second feedback associated with the one or more data extraction rules comprises detecting via a user interface at least one of a selection of a data extraction rule, an acceptance of a data extraction rule suggestion, a rejection of the data extraction rule suggestion, and a edit of the data extraction rule suggestion. 6 . The method of claim 1 , further comprising: detecting third feedback, wherein the third feedback comprises data corrections to the automatically tagged and extracted data. 7 . The method of claim 1 , further comprising: automatically generating and updating the validation rules based on at least one of a type of industry associated with the document, an originating geography associated with the document, a company associated with the document. 8 . A computer system, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: in response to electronically receiving a document, automatically classifying the document and different parts of the document, wherein automatically classifying the document comprises electronically identifying a document type associated with the document and electronically tagging data associated with the different parts of the document based on one or more classification rules pertaining to the identified document type and identified data type in the document; automatically extracting the tagged data associated with the automatically classified document based on one or more data extraction rules associated with the identified document type and the identified data type; detecting first feedback associated with the one or more classification rules and second feedback associated with the one or more data extraction rules; and automatically generating and updating validation rules based on the identified document type, the detected first feedback, and the detected second feedback to validate the automatically classified document and the automatically tagged and extracted data. 9 . The computer system of claim 8 , wherein automatically classifying the document and the different parts of the document further comprises: generating one or more classification rule suggestions based the one or more classification rules; and applying at least one classification rule suggestion from the generated one or more classification rules suggestions based on a percentage threshold. 10 . The computer system of claim 8 , wherein automatically extracting the tagged data associated with the automatically classified document further comprises: generating one or more data extraction rule suggestions based the one or more data extraction rules; and applying at least one data extraction rule suggestion from the generated one or more data extraction rule suggestions based on a percentage threshold. 11 . The computer system of claim 8 , wherein detecting the first feedback associated with the one or more classification rules comprises detecting via a user interface at least one of a selection of a classification rule, an acceptance of a classification rule suggestion, a rejection of the classification rule suggestion, and a edit of the classification rule suggestion. 12 . The computer system of claim 8 , wherein detecting the second feedback associated with the one or more data extraction rules comprises detecting via a user interface at least one of a selection of a data extraction rule, an acceptance of a data extraction rule suggestion, a rejection of the data extraction rule suggestion, and a edit of the data extraction rule suggestion. 13 . The computer system of claim 8 , further comprising: detecting third feedback, wherein the third feedback comprises data corrections to the automatically tagged and extracted data. 14 . The computer system of claim 8 , further comprising: automatically generating and updating the validation rules based on at least one of a type of industry associated with the document, an originating geography associated with the document, a company associated with the document. 15 . A computer program product for automatically detecting and concealing content associated with a notification in response to receiving and presenting the notification on a computing device, comprising: one or more tangible computer-readable storage devices and program instructions stored on at least one of the one or more tangible computer-readable storage devices, the program instructions executable by a processor, the program instructions comprising: in response to electronically receiving a document, automatically classify the document and different parts of the document, wherein automatically classifying the document comprises electronically identifying a document type associated with the document and electronically tagging data associated with the different parts of the document based on one or more classification rules pertaining to the identified document type and identified data type in the do

Assignees

Inventors

Classifications

  • based on feedback of a supervisor · CPC title

  • Rule-based classification · CPC title

  • G06V30/413Primary

    Classification of content, e.g. text, photographs or tables · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022398397A1 cover?
A method is provided. The method may include, in response to electronically receiving a document, automatically classifying the document and different parts of the document, by electronically identifying a document type associated with the document and electronically tagging data associated with the different parts of the document based on classification rules. The method may further include au…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06V30/413. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).