What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Rotation and scaling for optical character recognition using end-to-end deep learning

US11302108B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11302108-B2
Application number	US-201916565614-A
Country	US
Kind code	B2
Filing date	Sep 10, 2019
Priority date	Sep 10, 2019
Publication date	Apr 12, 2022
Grant date	Apr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, method, and computer program product embodiments for optical character recognition (OCR) pre-processing using machine learning. In an embodiment, a neural network may be trained to identify a standardized document rotation and scale expected by an OCR service performing character recognition. The neural network may then analyze a received document image to identify a corresponding rotation and scale of the document image relative to the expected standardized values. In response to this identification, the document image may be modified in the inverse to standardize the rotation and scale of the document image to match the format expected by the OCR service. In some embodiments, a neural network may perform the standardization as well as the character recognition using a shared computation graph.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method, comprising: receiving a document image; generating a grid of one or more cropped portions of the document image; analyzing the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregating the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modifying the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmitting the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combining the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 2. The computer implemented method of claim 1 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 3. The computer implemented method of claim 1 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 4. The computer implemented method of claim 1 , wherein the aggregating further comprises: calculating an average of a vector including the one or more image parameter values to determine the modification. 5. The computer implemented method of claim 1 , wherein the modifying further comprises: adding white space to the document image to generate the standardized document image. 6. A system, comprising: a memory; and at least one processor coupled to the memory and configured to: receive a document image; generate a grid of one or more cropped portions of the document image; analyze the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregate the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modify the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmit the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combine the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 7. The system of claim 6 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 8. The system of claim 6 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 9. The system of claim 6 , wherein to aggregate the one or more image parameters, the at least one processor is further configured to: calculate an average of a vector including the one or more image parameter values to determine the modification. 10. The system of claim 6 , wherein to modify the document image, the at least one processor is further configured to: add white space to the document image to generate the standardized document image. 11. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: receiving a document image; generating a grid of one or more cropped portions of the document image; analyzing the one or more cropped portions using a machine learning algorithm to determine a corresponding one or more image parameter values corresponding to each of the one or more cropped portions; aggregating the one or more image parameter values to determine a modification to standardize the document image according to the machine learning algorithm; modifying the document image according to the modification to generate a standardized document image, wherein the modifying comprises dividing the document image into multiple portions; transmitting the standardized document image to an optical character recognition (OCR) service, wherein the transmitting comprises transmitting the multiple portions to the OCR service for individual character recognition of the multiple portions, and wherein the machine learning algorithm and the OCR service use a common computational graph; and combining the multiple portions after the OCR service has performed an individual character recognition process on the multiple portions. 12. The non-transitory computer-readable device of claim 11 , wherein the one or more image parameter values are scaling parameter values indicating a scale size of each crop portion relative to an expected scale size expected by the OCR service. 13. The non-transitory computer-readable device of claim 11 , wherein the one or more image parameter values are rotation parameter values indicating a rotation of each crop portion relative to an expected orientation expected by the OCR service. 14. The non-transitory computer-readable device of claim 11 , wherein the aggregating further comprises: calculating an average of a vector including the one or more image parameter values to determine the modification.

Assignees

Sap Se

Inventors

Classifications

G06V30/18057
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
G06V30/1463
Orientation detection or correction, e.g. rotation of multiples of 90 degrees · CPC title
G06V30/166
Normalisation of pattern dimensions · CPC title
G06V10/82
using neural networks · CPC title
G06N3/08Primary
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 74851268

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11302108B2 cover?: Disclosed herein are system, method, and computer program product embodiments for optical character recognition (OCR) pre-processing using machine learning. In an embodiment, a neural network may be trained to identify a standardized document rotation and scale expected by an OCR service performing character recognition. The neural network may then analyze a received document image to identify …
Who is the assignee on this patent?: Sap Se
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for image modification and image based content capture and extraction in neural networks

Recognition and population of form fields in an electronic document

Categorizer assisted capture of customer documents using a mobile device

Dynamically generating table of contents for printable or scanned content

Image-based character recognition

Method and system for character recognition

Frequently asked questions