Methods and systems for extracting information from document images
US-2022284215-A1 · Sep 8, 2022 · US
US12315281B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12315281-B2 |
| Application number | US-202217710784-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 31, 2022 |
| Priority date | Apr 5, 2021 |
| Publication date | May 27, 2025 |
| Grant date | May 27, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The image processing apparatus having a scan function and including: a memory storing a program; and a processor executing the program to set a property of a business form file by using results of character recognition processing for a business form image obtained by reading a business form; perform predetermined preprocessing before the character recognition processing for the business form image; and perform the character recognition processing for each character area within the business form image for which the preprocessing is performed, wherein as the preprocessing, first preprocessing performed as a default and second preprocessing performed additionally for improving accuracy of the character recognition processing exist and in the performing preprocessing, whether to perform the second preprocessing following the first preprocessing is determined based on information specifying contents of preprocessing to be performed, which is registered in advance for a past business form image similar to the business form image.
Opening claim text (preview).
What is claimed is: 1. An image processing apparatus having a scan function, the image processing apparatus comprising: at least one memory that stores a program; and at least one processor that executes the program to perform: obtaining a first image by performing a first preprocessing for a scanned image obtained by reading a document; first determining whether information of a registered image similar to the first image is already registered in a database; obtaining a second image by performing a second preprocessing for the first image in a case where it is determined that the information of the registered image similar to the first image is not registered in the database; performing a character recognition processing for each character area within the second image setting a property of the scanned image by using first results of the character recognition processing performed for at least one character area selected within the second image; performing a character recognition processing for the at least one character area within the first image to obtain second results of the character recognition processing for the at least one character area; second determining, based on the first results and the second results, whether each of the at least one character area is a character area for which the second preprocessing should be performed; and registering, as information of a new registered image, information of the scanned image to the database, wherein the registered information of the new registered image includes information indicating whether each of the at least one character area is a character area for which the second preprocessing should be performed. 2. The image processing apparatus according to claim 1 , wherein in a case where it is determined that the information of the registered image similar to the first image is already registered in the database, the at least one processor executes the program to further perform: determining, based on the registered information of the registered image similar to the first image, whether or not each of the at least one character area within the first image is a character area for which the second preprocessing should be performed; performing the second preprocessing and the character recognition processing for a character area which is determined as the character area for which the second preprocessing should be performed; and performing the character recognition processing without the second preprocessing for a character area which is not determined as the character area for which the second preprocessing should be performed. 3. The image processing apparatus according to claim 1 , wherein in the second determining, it is determined based on the first results, the second results and character strings used as the property of the scanned image, whether each of the at least one character area is the character area for which the second preprocessing should be performed. 4. The image processing apparatus according to claim 1 , wherein the second preprocessing is binarization processing using a plurality of threshold values different for a plurality of areas. 5. The image processing apparatus according to claim 1 , wherein the second preprocessing is processing to extract a character area by utilizing a machine learning model. 6. The image processing apparatus according to claim 1 , wherein the second preprocessing is processing to remove a ruled line within an image. 7. A control method of an image processing apparatus having a scan function, the control method comprising: a first obtaining step of obtaining a first image by performing a first preprocessing for a scanned image obtained by reading a document; a first determining step of first determining whether information of a registered image similar to the first image is already registered in a database; a second obtaining step of obtaining a second image by performing a second preprocessing for the first image in a case where it is determined that the information of the registered image similar to the first image is not registered in the database; a character recognition step of performing a character recognition processing for each character area within the second image; a setting step of setting a property of the scanned image by using first results of the character recognition processing performed for at least one character area selected within the second image; a character recognition step of performing a character recognition processing for the at least one character area within the first image to obtain second results of the character recognition processing for the at least one character area; a second determining step of second determining, based on the first results and the second results, whether each of the at least one character area is a character area for which the second preprocessing should be performed; and a registering step of registering, as information of a new registered image, information of the scanned image to the database, wherein the registered information of the new registered image includes information indicating whether each of the at least one character area is a character area for which the second preprocessing should be performed. 8. A non-transitory computer readable storage medium storing a program for causing a computer to perform a control method of an image processing apparatus having a scan function, the control method comprising: a first obtaining step of obtaining a first image by performing a first preprocessing for a scanned image obtained by reading a document; a first determining step of first determining whether information of a registered image similar to the first image is already registered in a database; a second obtaining step of obtaining a second image by performing a second preprocessing for the first image in a case where it is determined that the information of the registered image similar to the first image is not registered in the database; a character recognition step of performing a character recognition processing for each character area within the second image; a setting step of setting a property of the scanned image by using first results of the character recognition processing performed for at least one character area selected within the second image; a character recognition step of performing a character recognition processing for the at least one character area within the first image to obtain second results of the character recognition processing for the at least one character area; a second determining step of second determining, based on the first results and the second results, whether each of the at least one character area is a character area for which the second preprocessing should be performed; and a registering step of registering, as information of a new registered image, information of the scanned image to the database, wherein the registered information of the new registered image includes information indicating whether each of the at least one character area is a character area for which the second preprocessing should be performed.
based on the type of document · CPC title
Determination of region of interest · CPC title
based on markings or identifiers characterising the document or the area · CPC title
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.