Rotation and scaling for optical character recognition using end-to-end deep learning

US11302108B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11302108-B2
Application numberUS-201916565614-A
CountryUS
Kind codeB2
Filing dateSep 10, 2019
Priority dateSep 10, 2019
Publication dateApr 12, 2022
Grant dateApr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, method, and computer program product embodiments for optical character recognition (OCR) pre-processing using machine learning. In an embodiment, a neural network may be trained to identify a standardized document rotation and scale expected by an OCR service performing character recognition. The neural network may then analyze a received document image to identify a corresponding rotation and scale of the document image relative to the expected standardized values. In response to this identification, the document image may be modified in the inverse to standardize the rotation and scale of the document image to match the format expected by the OCR service. In some embodiments, a neural network may perform the standardization as well as the character recognition using a shared computation graph.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method, comprising: receiving a document image; generating a grid of one or more cropped portions of the document image; analyzing the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregating the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modifying the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmitting the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combining the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 2. The computer implemented method of claim 1 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 3. The computer implemented method of claim 1 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 4. The computer implemented method of claim 1 , wherein the aggregating further comprises: calculating an average of a vector including the one or more image parameter values to determine the modification. 5. The computer implemented method of claim 1 , wherein the modifying further comprises: adding white space to the document image to generate the standardized document image. 6. A system, comprising: a memory; and at least one processor coupled to the memory and configured to: receive a document image; generate a grid of one or more cropped portions of the document image; analyze the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregate the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modify the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmit the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combine the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 7. The system of claim 6 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 8. The system of claim 6 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 9. The system of claim 6 , wherein to aggregate the one or more image parameters, the at least one processor is further configured to: calculate an average of a vector including the one or more image parameter values to determine the modification. 10. The system of claim 6 , wherein to modify the document image, the at least one processor is further configured to: add white space to the document image to generate the standardized document image. 11. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: receiving a document image; generating a grid of one or more cropped portions of the document image; analyzing the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregating the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modifying the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmitting the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combining the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 12. The non-transitory computer-readable device of claim 11 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 13. The non-transitory computer-readable device of claim 11 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 14. The non-transitory computer-readable device of claim 11 , wherein the aggregating further comprises: calculating an average of a vector including the one or more image parameter values to determine the modification.

Assignees

Inventors

Classifications

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Orientation detection or correction, e.g. rotation of multiples of 90 degrees · CPC title

  • Normalisation of pattern dimensions · CPC title

  • using neural networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11302108B2 cover?
Disclosed herein are system, method, and computer program product embodiments for optical character recognition (OCR) pre-processing using machine learning. In an embodiment, a neural network may be trained to identify a standardized document rotation and scale expected by an OCR service performing character recognition. The neural network may then analyze a received document image to identify …
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).