Systems and methods for separating ligature characters in digitized document images

US11710331B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11710331-B2
Application numberUS-202016953099-A
CountryUS
Kind codeB2
Filing dateNov 19, 2020
Priority dateMar 19, 2019
Publication dateJul 25, 2023
Grant dateJul 25, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.

First claim

Opening claim text (preview).

The invention claimed is: 1. A character separation system, the system comprising one or more processors configured to: detect one or more ligatures based on a comparison of a width of one or more contours in pixels exceeds a maximum wide character in a proportional font; determine which of a plurality of characters in a digitized document image are associated with the one or more ligatures; generate a contour around each of the one or more ligatures, wherein the contour includes a pixelated version of the ligature, wherein pixels associated with glyphs of the ligature are darkened; determine, for each scanned column of the contour including one or more darkened pixels, a height of a respective glyph associated with the one or more darkened pixels by scanning the column, wherein: the height of the respective glyph is determined based on: a first distance from a top of the contour to a topmost darkened pixel in the column determined by a first scan from the top of the contour to the topmost darkened pixel in the column, and a second distance from a bottom of the contour to a bottommost darkened pixel in the column determined by a second scan from the bottom of the contour to the bottommost darkened pixel in the column, and the scanning determines an imaginary vertical line so as to separate one or more characters composing the ligature, the imaginary vertical line based on a transition that defines a change in slope between columns comprising an end to a decreasing slope of pixels to a beginning of an increasing slope; identify a pinch point for the ligature based on the imaginary vertical line; remove the glyph associated with the pinch point; and separate the one or more characters associated with the ligature. 2. The character separation system of claim 1 , wherein the one or more processors are further configured to verify accuracy of the separated one or more characters. 3. The character separation system of claim 1 , wherein the contour includes (i) a height of the ligature and (ii) a width of the ligature. 4. The character separation system of claim 1 , wherein the one or more processors are configured to generate separate contours by slicing the contour in a vertical direction at the identified pinch point. 5. The character separation system of claim 1 , wherein a plurality of imaginary vertical lines are configured to separate three or more characters composing the ligature. 6. The character separation system of claim 1 , wherein the one or more processors are configured to store the height of the glyph in a first scanned column based on a determination indicative of a change in height of the glyph from the first scanned column to a height of another glyph in a second scanned column. 7. The character separation system of claim 1 , wherein the one or more processors are configured to prevent storage of the height of the glyph based on a determination indicative of zero change in a height of the glyph from a first scanned column to a height of another glyph in a second scanned column. 8. The character separation system of claim 1 , wherein a first separated contour is associated with a first character composing the ligature and a second separated contour is associated with a second character composing the ligature. 9. A method for character separation, the method comprising: detecting, by one or more processors, one or more ligatures based on a comparison of a width of one or more contours in pixels exceeds a maximum wide character in a proportional font; determining, by the one or more processors, which of a plurality of characters in a digitized document image are associated with the one or more ligatures; generating, by the one or more processors, a contour around each of the one or more ligatures, wherein the contour includes a pixelated version of the ligature, wherein pixels associated with glyphs of the ligature are darkened; and determining, by the one or more processors, for each scanned column of the contour including one or more darkened pixels, a height of a respective glyph associated with the one or more darkened pixels by scanning the column, wherein: the height of the respective glyph is determined based on: a first distance from a top of the contour to a topmost darkened pixel in the column determined by a first scan from the top of the contour to the topmost darkened pixel in the column, and a second distance from a bottom of the contour to a bottommost darkened pixel in the column determined by a second scan from the bottom of the contour to the bottommost darkened pixel in the column, and the scanning determines an imaginary vertical line so as to separate one or more characters composing the ligature, the imaginary vertical line based on a transition that defines a change in slope between columns comprising an end to a decreasing slope of pixels to a beginning of an increasing slope; identifying, by the one or more processors, a pinch point for the ligature based on the imaginary vertical line; removing, by the one or more processors, the glyph associated with the pinch point; and separating, by the one or more processors, the one or more characters associated with the ligature. 10. The method of claim 9 , further comprising verifying, by the one or more processors, accuracy of the separated one or more characters. 11. The method of claim 9 , further comprising detecting, by the one or more processors, the ligature based on evaluation of a dimension of the contour with respect to a threshold. 12. The method of claim 9 , further comprising generating, by the one or more processors, separate contours by slicing the contour in a vertical direction at the identified pinch point. 13. The method of claim 9 , wherein the contour includes (i) a height of the ligature and (ii) a width of the ligature. 14. The method of claim 9 , wherein a plurality of imaginary vertical lines are configured to separate three or more characters composing the ligature. 15. The method of claim 9 , further comprising: storing, by the one or more processors, the height of the glyph in a first scanned column based on a determination indicative of a change in height of the glyph from the first scanned column to a height of another glyph in a second scanned column. 16. The method of claim 12 , wherein a first separated contour is associated with a first character composing the ligature and a second separated contour is associated with a second character composing the ligature. 17. A computer readable non-transitory medium comprising computer-executable instructions that are executed on a processor and comprising the steps of: detect one or more ligatures based on a comparison of a width of one or more contours in pixels exceeds a maximum wide character in a proportional font; determine which of a plurality of characters in a digitized document image are associated with the one or more ligatures; generate a contour around each of the one or more ligatures, wherein the contour includes a pixelated version of the ligature, wherein pixels associated with glyphs of the ligature are darkened; determine, for each scanned column of the contour including one or more darkened pixels, a height of a respective glyph associated with the one or more darkened pixels by scanning the column, wherein: the height of the respective glyph is determined based on: a first distance from a top of the contour to a topmost darkened pixel in the column determined by a first scan from the top of the contour to the topmost darkened pixel in the column, and a second distance from a

Assignees

Inventors

Classifications

  • G06V30/414Primary

    Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

  • using recognition of characters or words · CPC title

  • Removing patterns interfering with the pattern to be recognised, such as ruled lines or underlines · CPC title

  • using character size, text spacings or pitch estimation · CPC title

  • Character recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11710331B2 cover?
Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06V30/414. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).