Automating creation of accurate OCR training data using specialized UI application

US10289905B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10289905-B2
Application numberUS-201816112190-A
CountryUS
Kind codeB2
Filing dateAug 24, 2018
Priority dateOct 5, 2016
Publication dateMay 14, 2019
Grant dateMay 14, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems of the present disclosure generate accurate training data for optical character recognition (OCR). Systems disclosed herein generates images of a text passage as displayed piecemeal in a user interface (UI) element rendered in a selected font type and size, determine accurate dimensions and locations of bounding boxes for each character pictured in the images, stitch together a training image by concatenating the images, and associate the training image, the bounding box dimensions and locations, and the text passage together in a collection of training data. The collection of training data also includes a computer-readable master copy of the text passage with newline characters inserted therein.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating training data for optical character recognition (OCR), the method comprising: receiving a training data passage comprising a plurality of characters; for each respective font type of a plurality of font types: for each respective font size of a plurality of font sizes: for each respective character in the plurality of characters: displaying the respective character alone in a character UI element in the respective font type and in the respective font size; capturing an image of the respective character as displayed in the character UI element; determining dimensions of the image of the respective character; and storing the dimensions of the image in a data structure, wherein the dimensions stored in the data structure are associated with the respective character. 2. The method of claim 1 , further comprising: for each respective character in the training data passage: determining a set of offsets for the respective character based on the image of the respective character, wherein each offset of the set of offsets indicates a distance between an edge of the image of the respective character and the respective character; and storing the set of offsets in the data structure, wherein the set of offsets stored in the data structure are associated with the respective character. 3. The method of claim 2 , further comprising: for each respective character in the training data passage: storing the respective font type and the respective font size in the data structure, wherein the respective font type and the respective font size stored in the data structure are associated with the respective character. 4. The method of claim 3 , further comprising: receiving a request for a width of a bounding box associated with a selected character of the plurality of characters; and providing the width of the bounding box based on dimensions of an image stored in the data structure associated with the selected character. 5. The method of claim 4 , wherein the width of the bounding box is measured in pixels. 6. The method of claim 3 , further comprising: receiving a request for a set of offsets associated with a selected character of the plurality of characters; and providing the set of offsets stored in the data structure associated with the selected character. 7. The method of claim 1 , further comprising: receiving a user selection of the plurality of font types; and receiving a user selection of the plurality of font types. 8. A system for generating training data for optical character recognition (OCR), comprising: one or more processors; and memory storing computer-executable instructions that, when executed on the one or more processors of the system, perform an operation for generating training data for optical character recognition (OCR), the operation comprising: receiving a training data passage comprising a plurality of characters; for each respective font type of a plurality of font types: for each respective font size of a plurality of font sizes: for each respective character in the plurality of characters:  displaying the respective character alone in a character UI element in the respective font type and in the respective font size;  capturing an image of the respective character as displayed in the character UI element;  determining dimensions of the image of the respective character; and  storing the dimensions of the image in a data structure, wherein the dimensions stored in the data structure are associated with the respective character. 9. The system of claim 8 , wherein the operation further comprises: for each respective character in the training data passage: determining a set of offsets for the respective character based on the image of the respective character, wherein each offset of the set of offsets indicates a distance between an edge of the image of the respective character and the respective character; and storing the set of offsets in the data structure, wherein the set of offsets stored in the data structure are associated with the respective character. 10. The system of claim 9 , wherein the operation further comprises: for each respective character in the training data passage: storing the respective font type and the respective font size in the data structure, wherein the respective font type and the respective font size stored in the data structure are associated with the respective character. 11. The system of claim 10 , wherein the operation further comprises: receiving a request for a width of a bounding box associated with a selected character of the plurality of characters; and providing the width of the bounding box based on dimensions of an image stored in the data structure associated with the selected character. 12. The system of claim 11 , wherein the width of the bounding box is measured in pixels. 13. The system of claim 10 , wherein the operation further comprises: receiving a request for a set of offsets associated with a selected character of the plurality of characters; and providing the set of offsets stored in the data structure associated with the selected character. 14. The system of claim 8 , wherein the operation further comprises: receiving a user selection of the plurality of font types; and receiving a user selection of the plurality of font types. 15. A non-transitory computer-readable storage medium comprising instructions for performing a method of generating training data for optical character recognition (OCR), the method comprising: receiving a training data passage comprising a plurality of characters; for each respective font type of a plurality of font types: for each respective font size of a plurality of font sizes: for each respective character in the plurality of characters: displaying the respective character alone in a character UI element in the respective font type and in the respective font size; capturing an image of the respective character as displayed in the character UI element; determining dimensions of the image of the respective character; and storing the dimensions of the image in a data structure, wherein the dimensions stored in the data structure are associated with the respective character. 16. The non-transitory computer-readable storage medium of claim 15 , wherein the method further comprises: for each respective character in the training data passage: determining a set of offsets for the respective character based on the image of the respective character, wherein each offset of the set of offsets indicates a distance between an edge of the image of the respective character and the respective character; and storing the set of offsets in the data structure, wherein the set of offsets stored in the data structure are associated with the respective character. 17. The non-transitory computer-readable storage medium of claim 16 , wherein the method further comprises: for each respective character in the training data passage: storing the respective font type and the respective font size in the data structure, wherein the respective font type and the respective font size stored in the data structure are associated with the respective character. 18. The non-transitory computer-readable storage medium of claim 17 , wherein the method further comprises: receiving a request for a width of a bounding box associated with a selected character of the plurality of characters; and providing the width of the bounding box based on dimensions of an image stored in the data s

Assignees

Inventors

Classifications

  • Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries, e.g. user dictionaries · CPC title

  • Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10289905B2 cover?
Systems of the present disclosure generate accurate training data for optical character recognition (OCR). Systems disclosed herein generates images of a text passage as displayed piecemeal in a user interface (UI) element rendered in a selected font type and size, determine accurate dimensions and locations of bounding boxes for each character pictured in the images, stitch together a training…
Who is the assignee on this patent?
Intuit Inc
What technology area does this patent fall under?
Primary CPC classification G06V30/1914. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 14 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).