Online training data generation for optical character recognition

US11295155B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11295155-B2
Application numberUS-202016843811-A
CountryUS
Kind codeB2
Filing dateApr 8, 2020
Priority dateApr 8, 2020
Publication dateApr 5, 2022
Grant dateApr 5, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system to generate training data for a deep learning model in memory instead of loading pre-generated data from disk storage. A corpus may be stored as lines of text. The lines of text can be manipulated in the memory of a central processing unit (CPU) of a computing system, using asynchronous multi-processing, in parallel with a training process being conducted on the system's graphics processing unit (GPU). With such an approach, for a given line of text, it is possible to take advantage of different fonts and different types of image augmentation without having to put the images in disk storage for subsequent retrieval. Consequently, the same line of text can be used to generate different training images for use in different epochs, providing more variability in training data (no training sample is trained on more than once). A single training corpus may yield many different training data sets. In one aspect, the model being trained is a deep learning model, which may be one of several different types of neural networks. The training enables the deep learning model to perform OCR on line images.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: a. receiving a sequence of text lines from a corpus, each of the text lines having no more than a predetermined length; b. grouping the text lines into mini-batches; c. randomizing the mini-batches; d. selecting a mini-batch; e. converting the selected mini-batch into ground truth labels; f. receiving a randomly selected font from a set of fonts; g. rendering text lines in the selected mini-batch using the randomly selected font to produce an image of the text lines; h. processing the image to produce an image mini-batch; and i. outputting the ground truth labels and image mini-batch to a training model as training data. 2. A method according to claim 1 , further comprising repeating a.-i. for the entire contents of the corpus. 3. A method according to claim 1 , further comprising repeating a.-i. more than once for the entire contents of the corpus until the training model achieves a predetermined level of accuracy. 4. A method according to claim 1 , wherein the receiving a sequence of text lines comprises receiving a sentence, segmenting that sentence into the text lines, and sorting the text lines according to length. 5. A method according to claim 1 , further comprising augmenting the text in the selected mini-batch to produce an augmented mini-batch, and performing the converting and rendering on the augmented mini-batch. 6. A method according to claim 5 , wherein augmenting the text comprises text processing selected from the group consisting of changing a case of one or more words in the text, converting one or more characters in the text between full-width and half-width, placing spaces between letters of one or more words in the text, placing additional spaces between one or more words in the text, and inserting or removing diacritic marks. 7. A method according to claim 1 , wherein processing the image comprises cropping and resizing the image and augmenting the cropped and resized image. 8. A method according to claim 7 , wherein augmenting the cropped and resized image comprises performing a process selected from the group consisting of rotation, random affine transformation, random perspective transformation, random elastic transformation, random morphological operation, random Gaussian blurring, and random intensity inversion. 9. A method according to claim 1 , wherein the processing comprises augmenting the image mini-batch by introducing noise selected from the group consisting of Gaussian noise, impulse noise, Poisson noise, speckle noise, fixed pattern noise, random noise, and banding noise. 10. A method according to claim 1 , wherein the processing comprises adding padding pixels so that the image mini-batch has a uniform length. 11. A system comprising: a graphics processing unit (GPU), the GPU programmed to implement a training model; GPU memory connected to the GPU; a central processing unit (CPU), the CPU programmed to provide training data to train the training model to perform optical character recognition (OCR); CPU memory connected to the CPU; and non-volatile storage connected to the CPU and the GPU, the non-volatile storage storing a corpus of text; the system including a program which causes the system to perform the following: a. grouping a sequence of text lines from a corpus, each of the text lines having no more than a predetermined length, into mini-batches; b. randomizing the mini-batches; c. selecting a mini-batch; d. converting the selected mini-batch into ground truth labels; e. receiving a randomly selected font from a set of fonts; f. rendering text lines in the selected mini-batch using the randomly selected font to produce an image of the text lines; g. processing the image to produce an image mini-batch; and h. outputting the ground truth labels and image mini-batch to a training model as training data. 12. A system according to claim 11 , wherein the program causes the system to perform a.-h. for the entire contents of the corpus. 13. A system according to claim 11 , wherein the program causes the system to perform a.-h. more than once for the entire contents of the corpus until the training model achieves a predetermined level of accuracy. 14. A system according to claim 11 , wherein the receiving a sequence of text lines comprises receiving a sentence, segmenting that sentence into the text lines, and sorting the text lines according to length. 15. A system according to claim 11 , wherein the program causes the system to perform the following: augmenting the text in the selected mini-batch by performing text processing selected from the group consisting of changing a case of one or more words in the text, converting one or more characters in the text between full-width and half-width, placing spaces between letters of one or more words in the text, placing additional spaces between one or more words in the text, and inserting or removing diacritic marks, to produce an augmented mini-batch; and performing the converting and rendering on the augmented mini-batch. 16. A system according to claim 11 , wherein processing the image comprises cropping the image, resizing the image and adding padding pixels so that the image mini-batch has a uniform length, and augmenting the cropped and resized image. 17. A system according to claim 16 , wherein augmenting the cropped and resized image comprises performing a process selected from the group consisting of rotation, random affine transformation, random perspective transformation, random elastic transformation, random morphological operation, random Gaussian blurring, and random intensity inversion. 18. A system according to claim 11 , wherein processing the image comprises augmenting the image mini-batch by introducing noise selected from the group consisting of Gaussian noise, impulse noise, Poisson noise, speckle noise, fixed pattern noise, random noise, and banding noise. 19. A system according to claim 11 , wherein the training model is a deep learning model. 20. A system according to claim 19 , wherein the deep learning model is a neural network.

Assignees

Inventors

Classifications

  • G06F18/214Primary

    Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • using recognition of characters or words · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Training; Learning · CPC title

  • Accessing generic data, e.g. fonts · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11295155B2 cover?
A method and system to generate training data for a deep learning model in memory instead of loading pre-generated data from disk storage. A corpus may be stored as lines of text. The lines of text can be manipulated in the memory of a central processing unit (CPU) of a computing system, using asynchronous multi-processing, in parallel with a training process being conducted on the system's gra…
Who is the assignee on this patent?
Konica Minolta Business Solutions Usa Inc
What technology area does this patent fall under?
Primary CPC classification G06F18/214. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 05 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).