Character-based representation learning for table data extraction using artificial intelligence techniques
US-2023368556-A1 · Nov 16, 2023 · US
US12183100B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12183100-B2 |
| Application number | US-202217653999-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 8, 2022 |
| Priority date | Jan 22, 2022 |
| Publication date | Dec 31, 2024 |
| Grant date | Dec 31, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various methods, apparatuses/systems, and media for data processing are disclosed. A processor receives a digital document; applies an optical character recognition (OCR) algorithm on said received digital document by utilizing an OCR tool; identifies defective data extracted by the OCR tool resulted from relatively inferior image quality of the received digital document; implements an auto rectification algorithm on the identified defective data; automatically generates, in response to implementing the auto rectification algorithm, corresponding auto-rectified data for each identified defective data; records the defective data and corresponding auto-rectified data at a field level; receives user input data on said recorded auto-rectified data; determines whether the auto-rectified data is correct or not; and populates, based on determining that the auto-rectified data is correct, a machine learning model with said received user input data to be utilized for subsequently received digital document.
Opening claim text (preview).
What is claimed is: 1. A method for data processing by utilizing one or more processors along with allocated memory, the method comprising: receiving a digital document; applying an optical character recognition (OCR) algorithm on said received digital document by utilizing an OCR tool; identifying defective data extracted by the OCR tool resulted from relatively inferior image quality of the received digital document; implementing an auto rectification algorithm on the identified defective data; automatically generating, in response to implementing the auto rectification algorithm, corresponding auto-rectified data for each identified defective data; recording the defective data and corresponding auto-rectified data at a field level; receiving user input data on said recorded auto-rectified data; determining whether the auto-rectified data is correct or not; populating, based on determining that the auto-rectified data is correct, a machine learning model with said received user input data to be utilized for subsequently received digital document; generating a plurality of first selectable icons, wherein each of said first selectable icon is configured to display corresponding auto-rectified field data when user input is received by clicking or hovering over the first selectable icon; receiving user input data that the auto-rectified field data is not correct based on user's comparing comparison of the auto-rectified field data with a corresponding original image data of the digital document; and receiving user input data indicating a user defined correct field data replacing the auto-rectified field data. 2. The method according to claim 1 , wherein the defective data includes one or more of the following data: unwanted extraction data, partial data, incomplete data, junk data, and perfect but incomplete extraction data. 3. The method according to claim 1 , further comprising: receiving user input data indicating approval of the auto-rectified field data when a difference between an auto-rectified data value and user input data value is equal to or more than a predetermined threshold value. 4. The method according to claim 1 , further comprising: receiving user input data indicating disapproval of the auto-rectified field data when a difference between an auto-rectified data value and user input data value is below a predetermined threshold value. 5. The method according to claim 4 , further comprising: generating a plurality of second selectable icons, wherein each of said second selectable icon is configured to display, upon receiving user input via clicking or hovering over the second selectable icon, corresponding suggested potential match field data for a corresponding disapproval of the auto-rectified field data based on historical patterns data that was generated previously in correcting the disapproved auto-rectified field data; receiving user input in approving the suggested potential match field data; and populating the machine learning model with said approved suggested potential match field data to be utilized for subsequently received digital document. 6. The method according to claim 5 , wherein when suggested potential match field data is not available for a certain extracted or user populated data, the method further comprising: receiving user input data that accepts the certain extracted or the user populated data as a new field data for subsequent suggestions; and populating the machine learning model with said new field data to be utilized for subsequently received digital document. 7. A system for data processing, the system comprising: a processor; and a memory operatively connected to the processor via a communication interface, the memory storing computer readable instructions, when executed, causes the processor to: receive a digital document; apply an optical character recognition (OCR) algorithm on said received digital document by utilizing an OCR tool; identify defective data extracted by the OCR tool resulted from relatively inferior image quality of the received digital document; implement an auto rectification algorithm on the identified defective data; automatically generate, in response to implementing the auto rectification algorithm, corresponding auto-rectified data for each identified defective data; record the defective data and corresponding auto-rectified data at a field level; receive user input data on said recorded auto-rectified data; determine whether the auto-rectified data is correct or not; populate, based on determining that the auto-rectified data is correct, a machine learning model with said received user input data to be utilized for subsequently received digital document; generate a plurality of first selectable icons, wherein each of said first selectable icon is configured to display corresponding auto-rectified field data when user input is received by clicking or hovering over the first selectable icon; receive user input data that the auto-rectified field data is not correct based on user's comparing comparison of the auto-rectified field data with a corresponding original image data of the digital document; and receive user input data indicating a user defined correct field data replacing the auto-rectified field data. 8. The system according to claim 7 , wherein the defective data includes one or more of the following data: unwanted extraction data, partial data, incomplete data, junk data, and perfect but incomplete extraction data. 9. The system according to claim 7 , wherein the processor is further configured to: receive user input data indicating approval of the auto-rectified field data when a difference between an auto-rectified data value and user input data value is equal to or more than a predetermined threshold value. 10. The system according to claim 7 , wherein the processor is further configured to: receive user input data indicating disapproval of the auto-rectified field data when a difference between an auto-rectified data value and user input data value is below a predetermined threshold value. 11. The system according to claim 10 , wherein the processor is further configured to: generate a plurality of second selectable icons, wherein each of said second selectable icon is configured to display, upon receiving user input via clicking or hovering over the second selectable icon, corresponding suggested potential match field data for a corresponding disapproval of the auto-rectified field data based on historical patterns data that was generated previously in correcting the disapproved auto-rectified field data; receive user input in approving the suggested potential match field data; and populate the machine learning model with said approved suggested potential match field data to be utilized for subsequently received digital document. 12. The system according to claim 11 , wherein when suggested potential match field data is not available for a certain extracted or user populated data, the processor is further configured to: receive user input data that accepts the certain extracted or the user populated data as a new field data for subsequent suggestions; and populate the machine learning model with said new field data to be utilized for subsequently received digital document. 13. A non-transitory computer readable medium configured to store instructions for data processing, wherein, when executed, the instructions cause a processor to perform the following: receiving a digital document; applying an optical character recognition (OCR) algorithm on said received digital document by utilizing an OCR tool; identifying
Interaction with lists of selectable items, e.g. menus · CPC title
Interactive pattern learning with a human teacher · CPC title
Editing, e.g. inserting or deleting · CPC title
Validation; Performance evaluation · CPC title
using icons (graphical or visual programming using iconic symbols G06F8/34) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.