Enhanced optical character recognition (ocr) image segmentation system and method
US-2021406576-A1 · Dec 30, 2021 · US
US11495014B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11495014-B2 |
| Application number | US-202016935880-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 22, 2020 |
| Priority date | Jul 22, 2020 |
| Publication date | Nov 8, 2022 |
| Grant date | Nov 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are configured for correcting the orientation of an image data object subject to optical character recognition (OCR) by receiving an original image data object, generating initial machine readable text for the original image data object via OCR, generating an initial quality score for the initial machine readable text via machine-learning models, determining whether the initial quality score satisfies quality criteria, upon determining that the initial quality score does not satisfy the quality criteria, generating a plurality of rotated image data objects each comprising the original image data object rotated to a different rotational position, generating a rotated machine readable text data object for each of the plurality of rotated image data objects and generating a rotated quality score for each of the plurality of rotated machine readable text data objects, and determining that one of the plurality of rotated quality scores satisfies the quality criteria.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for correcting an orientation of an image data object, the computer-implemented method comprising: receiving, by one or more processors, an original image data object; generating, by the one or more processors applying an optical character recognition (OCR) process, initial machine readable text for the original image data object; generating, by the one or more processors and using one or more machine learning models, an initial quality score for the initial machine readable text, wherein the initial quality score is indicative of a probability that one or more errors in the initial machine readable text are attributable to an incorrect image orientation associated with the original image data object; determining whether the initial quality score satisfies one or more quality criteria; responsive to determining that the initial quality score does not satisfy the one or more quality criteria, generating a plurality of rotated image data objects each corresponding to a different rotational position, wherein each of the plurality of rotated image data objects comprise the original image data object rotated to a corresponding rotational position; generating, by the one or more processors, a rotated machine readable text data object for each of the plurality of rotated image data objects, wherein each of the rotated machine readable text data objects are stored in association with corresponding rotated image data objects; generating, by the one or more processors and using one or more machine learning models, a rotated quality score for each of the rotated machine readable text data objects; determining that a first rotated quality score of the rotated quality scores satisfies the one or more quality criteria, wherein the first rotated quality score corresponds to a first rotated machine readable text data object of the rotated machine readable text data objects; and providing the first rotated machine readable text data object to a natural language processing (NLP) engine. 2. The computer-implemented method of claim 1 , wherein generating an initial quality score comprises: identifying one or more words within the initial machine readable text based at least in part on a machine-learning model for identifying spaces between words; comparing each of the one or more words identified within the initial machine readable text against words within a dictionary retrieved for checking spelling within the initial machine readable text; generating a spelling error detection rate for the initial machine readable text; determining the initial quality score based at least in part on the spelling error detection rate for the initial machine readable text. 3. The computer-implemented method of claim 2 , further comprising: identifying, within metadata associated with the original image data object, a language associated with the original image data object; and retrieving the dictionary based at least in part on the language associated with the original image data object. 4. The computer-implemented method of claim 1 , wherein generating a plurality of rotated image data objects comprises: generating a first rotated image data object comprising the original image data object rotated to a first rotational position; generating a second rotated image data object comprising the original image data object rotated to a second rotational position; generating a third rotated image data object comprising the original image data object rotated to a third rotational position; and storing each of the first rotated image data object, the second rotated image data object, and the third rotated image data object in association with the original image data object. 5. The computer-implemented method of claim 1 , wherein generating an initial quality score for the initial machine readable text comprises: generating text metadata comprising text summarization metrics for the initial machine readable text; processing the text metadata using one or more machine learning models to generate the initial quality score and associating the initial quality score with the initial machine readable text. 6. The computer-implemented method of claim 5 , wherein the text summarization metrics comprise one or more of: a count of words not evaluated within the initial machine readable text; a count of words evaluated within the initial machine readable text; a count of words within the initial machine readable text not found in a dictionary; a count of words within the initial machine readable text found in the dictionary; a count of words within the initial machine readable text; or a count of space characters within the initial machine readable text. 7. An apparatus for correcting an orientation of an image data object, the apparatus comprising at least one processor and at least one memory including program code, the at least one memory and the program code configured to, with the at least one processor, cause the apparatus to at least: receive an original image data object; generate, at least in part by applying an optical character recognition (OCR) process, initial machine readable text for the original image data object; generate, at least in part by using one or more machine learning models, an initial quality score for the initial machine readable text, wherein the initial quality score is indicative of a probability that one or more errors in the initial machine readable text are attributable to an incorrect image orientation associated with the original image data object; determine whether the initial quality score satisfies one or more quality criteria; responsive to determining that the initial quality score does not satisfy the one or more quality criteria, generate a plurality of rotated image data objects each corresponding to a different rotational position, wherein each of the plurality of rotated image data objects comprise the original image data object rotated to a corresponding rotational position; generate a rotated machine readable text data object for each of the plurality of rotated image data objects, wherein each of the rotated machine readable text data objects are stored in association with corresponding rotated image data objects; generate, at least in part by using one or more machine learning models, a rotated quality score for each of the rotated machine readable text data objects; determine that a first rotated quality score of the rotated quality scores satisfies the one or more quality criteria, wherein the first rotated quality score corresponds to a first rotated machine readable text data object of the rotated machine readable text data objects; and provide the first rotated machine readable text data object to a natural language processing (NLP) engine. 8. The apparatus of claim 7 , wherein generating an initial quality score comprises: identifying one or more words within the initial machine readable text based at least in part on a machine-learning model for identifying spaces between words; comparing each of the one or more words identified within the initial machine readable text against words within a dictionary retrieved for checking spelling within the initial machine readable text; generating a spelling error detection rate for the initial machine readable text; determining the initial quality score based at least in part on the spelling error detection rate for the initial machine readable text. 9. The apparatus of claim 8 , wherein the at least one memory and the program code is configured to, with the at least one processor, cause the apparatus to further: identify, within metadata associated with the original image data object,
Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title
Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Recognition assisted with metadata · CPC title
Character recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.