What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Optical character recognition employing deep learning with machine generated training data

US10489682B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10489682-B1
Application number	US-201715851617-A
Country	US
Kind code	B1
Filing date	Dec 21, 2017
Priority date	Dec 21, 2017
Publication date	Nov 26, 2019
Grant date	Nov 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An optical character recognition system employs a deep learning system that is trained to process a plurality of images within a particular domain to identify images representing text within each image and to convert the images representing text to textually encoded data. The deep learning system is trained with training data generated from a corpus of real-life text segments that are generated by a plurality of OCR modules. Each of the OCR modules produces a real-life image/text tuple, and at least some of the OCR modules produce a confidence value corresponding to each real-life image/text tuple. Each OCR module is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain. Synthetically generated text segments are produced by programmatically converting text strings to a corresponding image where each text string and corresponding image form a synthetic image/text tuple.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training a computerized deep learning system utilized by an optical character recognition system comprising the computer-implemented operations of: generating a plurality of synthetic text segments, by programmatically converting each of a plurality of text strings to a corresponding image, each text string and corresponding image forming a synthetic image/text tuple; generating a plurality of real-life text segments by processing from a corpus of document images, at least a subset of images from the corpus, with a plurality of OCR programs, each of the OCR programs processing each image from the subset to produce a real-life image/text tuple, and at least some of the OCR programs producing a confidence value corresponding to each real-life image/text tuple, and wherein each OCR program is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain; storing the synthetic image/text tuple and the real-life image/text tuple to data storage as training data in a format accessible by the computerized deep learning system for training; and training the computerized deep learning system with the training data. 2. The computer-implemented method of claim 1 further comprising: augmenting the synthetic image/text tuples and the real-life image/text tuples data by adding noise to image portions of the tuples. 3. The computer-implemented method of claim 2 wherein adding noise to image portions of the tuples comprises: randomly selecting image portions of the tuples and superimposing to the selected image portions, noise selected from the group consisting of random speckled noise, random lines, random binarization threshold, white on black text. 4. The computer-implemented method of claim 2 wherein adding noise to image portions of the tuples comprises: randomly selecting image portions of the tuples and superimposing patterned noise to the selected image portions. 5. The computer-implemented method of claim 1 further comprising processing the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system. 6. The computer-implemented method of claim 5 wherein processing the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system comprises: scaling the image portion of each of the tuples to fit in a field of view of the computerized deep learning system. 7. The computer-implemented method of claim 5 wherein processing the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system comprises: centering the image portion of each of the tuples within a field of view of the computerized deep learning system. 8. The computer-implemented method of claim 1 further comprising: processing, for storage as training data, output of the OCR programs by employing statistical metrics to identify the highest quality tuples generated by the OCR programs. 9. The computer-implemented method of claim 8 wherein employing the statistical metrics comprises: selecting, between confidence metrics of equal value generated by two or more OCR programs, a confidence metric generated from a deep-learning based OCR program over confidence metrics generated from OCR programs not based on computerized deep learning; selecting segments in order of OCR confidence as indicated by confidence metric generated by an OCR program; and selecting segments for which the same text is generated by the OCR programs, and if the same text is not generated by the OCR programs then selecting segments having the least edit distance. 10. The computer-implemented method of claim 9 further comprising: identifying a subset of the real-life image/text tuples for labeling by humans, the subset characterized by a range of confidence values and differing outputs among the OCR programs for given segments. 11. The computer-implemented method of claim 1 further comprising modifying a font of the image portion of at least a subset of the synthetic image/text tuples. 12. The computer-implemented method of claim 1 wherein generating a plurality of synthetic text segments comprises randomly selecting sets of consecutive words from a text corpus comprising a set of fully-formed English language sentences. 13. The computer-implemented method of claim 1 wherein generating a plurality of synthetic text segments comprises randomly selecting sets of consecutive words from a text corpus characterized by common text elements in the identified domain. 14. The computer-implemented method of claim 12 further comprising modifying the selected sets of consecutive words to reflect biases of character types that occur in the identified domain. 15. The computer-implemented method of claim 13 further comprising modifying the selected sets of consecutive words to reflect biases of character types that occur in the identified domain. 16. The computer-implemented method of claim 14 further comprising generating the image portion of the synthetic image/text tuple in accordance with a randomly chosen font and font size. 17. The computer-implemented method of claim 15 further comprising generating the image portion of the synthetic image/text tuple in accordance with a randomly chosen font and font size. 18. A computerized optical character recognition system comprising: a computerized deep learning system trained to process a plurality of encoded images within a particular domain to identify images representing text within each encoded image and converting the encoded images representing text to textually encoded data; data storage for storing the encoded images representing text and textually encoded data; wherein the computerized deep learning system is trained with training data generated from a corpus of, real-life text segments generated by processing from a corpus of encoded document images, at least a subset of encoded images from the corpus, with a plurality of OCR modules, each of the OCR modules processing each encoded image from the corpus to produce a real-life image/text tuple, and at least some of the OCR modules producing a confidence value corresponding to each real-life image/text tuple, and wherein each OCR module is characterized by an conversion accuracy substantially below a desired accuracy for an identified domain; and synthetically generated text segments, generated by programmatically converting each of a plurality of text strings to a corresponding encoded image, each text string and corresponding encoded image forming a synthetic image/text tuple. 19. The computerized optical character recognition system of claim 18 wherein the real-life image/text tuples are processed to fit within a field of view of the computerized deep learning system, and wherein the synthetic image/text tuples are processed to reflect textual characteristics of the identified domain. 20. A computerized system for training a computerized deep learning system utilized by an optical character recognition system comprising: a processor configured to execute instructions that when executed cause the processor to: generate a plurality of synthetic text segments, by programmatically converting each of a plurality of text strings to a corresponding image, each text string and corresponding image forming a synthetic image/text tuple; and generate a plurality of

Assignees

Automation Anywhere Inc

Inventors

Classifications

G06V30/19133
Interactive pattern learning with a human teacher · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06V30/162
Quantising the image signal · CPC title
G06V30/10
Character recognition · CPC title
G06V30/19147
Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

View patent family 68617928

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10489682B1 cover?: An optical character recognition system employs a deep learning system that is trained to process a plurality of images within a particular domain to identify images representing text within each image and to convert the images representing text to textually encoded data. The deep learning system is trained with training data generated from a corpus of real-life text segments that are generated…
Who is the assignee on this patent?: Automation Anywhere Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).