Systems and methods for image modification and image based content capture and extraction in neural networks
US-2019114743-A1 · Apr 18, 2019 · US
US11302108B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11302108-B2 |
| Application number | US-201916565614-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 10, 2019 |
| Priority date | Sep 10, 2019 |
| Publication date | Apr 12, 2022 |
| Grant date | Apr 12, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are system, method, and computer program product embodiments for optical character recognition (OCR) pre-processing using machine learning. In an embodiment, a neural network may be trained to identify a standardized document rotation and scale expected by an OCR service performing character recognition. The neural network may then analyze a received document image to identify a corresponding rotation and scale of the document image relative to the expected standardized values. In response to this identification, the document image may be modified in the inverse to standardize the rotation and scale of the document image to match the format expected by the OCR service. In some embodiments, a neural network may perform the standardization as well as the character recognition using a shared computation graph.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method, comprising: receiving a document image; generating a grid of one or more cropped portions of the document image; analyzing the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregating the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modifying the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmitting the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combining the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 2. The computer implemented method of claim 1 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 3. The computer implemented method of claim 1 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 4. The computer implemented method of claim 1 , wherein the aggregating further comprises: calculating an average of a vector including the one or more image parameter values to determine the modification. 5. The computer implemented method of claim 1 , wherein the modifying further comprises: adding white space to the document image to generate the standardized document image. 6. A system, comprising: a memory; and at least one processor coupled to the memory and configured to: receive a document image; generate a grid of one or more cropped portions of the document image; analyze the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregate the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modify the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmit the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combine the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 7. The system of claim 6 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 8. The system of claim 6 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 9. The system of claim 6 , wherein to aggregate the one or more image parameters, the at least one processor is further configured to: calculate an average of a vector including the one or more image parameter values to determine the modification. 10. The system of claim 6 , wherein to modify the document image, the at least one processor is further configured to: add white space to the document image to generate the standardized document image. 11. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: receiving a document image; generating a grid of one or more cropped portions of the document image; analyzing the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregating the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modifying the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmitting the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combining the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 12. The non-transitory computer-readable device of claim 11 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 13. The non-transitory computer-readable device of claim 11 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 14. The non-transitory computer-readable device of claim 11 , wherein the aggregating further comprises: calculating an average of a vector including the one or more image parameter values to determine the modification.
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Orientation detection or correction, e.g. rotation of multiples of 90 degrees · CPC title
Normalisation of pattern dimensions · CPC title
using neural networks · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.