Utilizing machine-learning based object detection to improve optical character recognition

US12288406B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12288406-B2
Application numberUS-202117490610-A
CountryUS
Kind codeB2
Filing dateSep 30, 2021
Priority dateSep 30, 2021
Publication dateApr 29, 2025
Grant dateApr 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately enhancing optical character recognition with a machine learning approach for determining words from reverse text, vertical text, and atypically-sized text. For example, the disclosed systems segment a digital image into text regions and non-text regions utilizing an object detection machine learning model. Within the text regions, the disclosed systems can determine reverse text glyphs, vertical text glyphs, and/or atypically-sized text glyphs utilizing an edge based adaptive binarization model. Additionally, the disclosed systems can utilize respective modification techniques to manipulate reverse text glyphs, vertical text glyphs, and/or atypically-sized glyphs for analysis by an optical character recognition model. The disclosed systems can further utilize an optical character recognition model to determine words from the modified versions of the reverse text glyphs, the vertical text glyphs, and/or the atypically-sized text glyphs.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause a computing device to: segment a digital image into a plurality of text regions and one or more non-text regions utilizing an object detection machine learning model; generate a plurality of inverted text glyphs from a reverse text region of the plurality of text regions utilizing an edge based adaptive binarization model, wherein the reverse text region depicts text characters in a text color lighter than a lightness threshold overlaid on background pixels in a background color darker than a darkness threshold; merge inverted text glyphs from the plurality of inverted text glyphs into an inverted text glyph group associated with the reverse text region; determine, utilizing the edge based adaptive binarization model, a vertical text region within the digital image by dilating the digital image in a horizontal direction and comparing dimensions of dilated bounding boxes of the plurality of text regions based on dilating the digital image, wherein the vertical text region comprises a bounding box of text characters oriented vertically with respect to the digital image; generate, from the vertical text region, a rotated text digital image comprising multiple rotated versions of the vertical text region; determine one or more words from the rotated text digital image and the inverted text glyph group utilizing an optical character recognition model; and generate, for display on a client device, a searchable digital image from the one or more words. 2. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the plurality of inverted text glyphs by utilizing the edge based adaptive binarization model to: invert a color scheme associated with the reverse text region by swapping a background color and a glyph color within the reverse text region; convert, from the inverted color scheme, the glyph color to black and the background color to white for the reverse text region; and generate, from the converted glyph color and the converted background color, bounding boxes for individual glyphs to indicate the plurality of inverted text glyphs. 3. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to merge the inverted text glyphs into the inverted text glyph group by: determining bounding boxes for individual inverted text glyphs within the reverse text region; determining that two or more of the bounding boxes for the individual inverted text glyphs correspond to reverse text glyphs along a common line within the reverse text region; combining the two or more of the bounding boxes that are along the common line into a combined bounding box; and merging, into the inverted text glyph group, the combined bounding box with one or more additional combined bounding boxes corresponding to different lines within the reverse text region. 4. The non-transitory computer readable medium of claim 3 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: combine the two or more the bounding boxes in response to determining that the two or more of the bounding boxes are within the common line within the reverse text region and within a threshold horizontal distance of each other; and merge the combined bounding box with the one or more additional combined bounding boxes in response to determining that the combined bounding box and the one or more additional combined bounding boxes are on different lines within the reverse text region and within a threshold vertical distance of each other. 5. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to determine the vertical text region by determining one or more of: a first vertically orientated bounding box including vertically arranged glyphs; ora second vertically orientated bounding box including horizontally arranged glyphs. 6. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: identify the reverse text region within the digital image from among the plurality of text regions; and determine that the digital image belongs to a reverse text category of digital images in response to identifying the reverse text region. 7. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the rotated text digital image by: generating a first rotated version of the vertical text region by rotating the vertical text region by a first magnitude; generating a second rotated version of the vertical text region by rotating the vertical text region by a second magnitude; and including the first rotated version and the second rotated version in the rotated text digital image. 8. A system comprising: one or more memory devices; and one or more processors configured to cause the system to: segment a digital image into a plurality of text regions and one or more non-text regions utilizing an object detection machine learning model; generate a plurality of inverted text glyphs from a reverse text region of the plurality of text regions utilizing an edge based adaptive binarization model, wherein the reverse text region depicts text characters in a text color lighter than a lightness threshold overlaid on background pixels in a background color darker than a darkness threshold; merge inverted text glyphs from the plurality of inverted text glyphs into an inverted text glyph group associated with the reverse text region; determine, utilizing the edge based adaptive binarization model, a vertical text region within the digital image by dilating the digital image in a horizontal direction and comparing dimensions of dilated bounding boxes of the plurality of text regions based on dilating the digital image, wherein the vertical text region comprises a bounding box of text characters oriented vertically with respect to the digital image; generate, from the vertical text region, a rotated text digital image comprising multiple rotated versions of the vertical text region; determine one or more words from the rotated text digital image and the inverted text glyph group utilizing an optical character recognition model; and generate, for display on a client device, a searchable digital image from the one or more words. 9. The system of claim 8 , wherein the one or more processors are further configured to cause the system to generate the plurality of inverted text glyphs by utilizing the edge based adaptive binarization model to: invert a color scheme associated with the reverse text region by swapping a background color and a glyph color within the reverse text region; convert, from the inverted color scheme, the glyph color to black and the background color to white for the reverse text region; and generate, from the converted glyph color and the converted background color, bounding boxes for individual glyphs to indicate the plurality of inverted text glyphs. 10. The system of claim 8 , wherein the one or more processors are further configured to cause the system to merge the inverted text glyphs into the inverted text glyph group by: determining bounding boxes for individual inverted text glyphs wi

Assignees

Inventors

Classifications

  • Classification of content, e.g. text, photographs or tables · CPC title

  • G06V30/162Primary

    Quantising the image signal · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12288406B2 cover?
The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately enhancing optical character recognition with a machine learning approach for determining words from reverse text, vertical text, and atypically-sized text. For example, the disclosed systems segment a digital image into text regions and non-text regions utilizing an object detection mac…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06V30/162. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).