Systems and methods for separating ligature characters in digitized document images

US10878271B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10878271-B2
Application numberUS-201916357878-A
CountryUS
Kind codeB2
Filing dateMar 19, 2019
Priority dateMar 19, 2019
Publication dateDec 29, 2020
Grant dateDec 29, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for separating characters associated with ligatures in digitized document images, the system comprising one or more processors configured to: receive at least one digitized document image including a plurality of characters; determine which of the plurality of characters in the at least one digitized document image are associated with ligatures; generate a contour around each of the ligatures, wherein the contour includes a pixelated version of the ligature, wherein pixels associated with glyphs of the ligature are darkened; scan each column of the contour; determine, for each scanned column including at least one darkened pixel, a height of a respective glyph associated with the at least one darkened pixel by scanning from top and bottom of the column of the contour, wherein the scanning determines an imaginary vertical line so as to separate one or more characters from the plurality of characters, the imaginary vertical line based on a transition from a first point belonging to one or more darkened pixels of the column to a second point belonging to one or more darkened pixels of a different column, the transition defining a change from a first slope to a second slope between the column and different column; identify a pinch point for the ligature based on the imaginary vertical line and a comparison between a plurality of adjacent scanned columns including at least one darkened pixel; remove the glyph associated with the pinch point; and separate the one or more characters associated with the ligature based on the removed glyph and the imaginary vertical line. 2. The system of claim 1 , wherein the one or more processors are further configured to: (i) receive the separated characters and (ii) verify the accuracy of the separated characters. 3. The system of claim 1 , wherein the columns are scanned from at least one selected from the group of: (i) from left to right and (ii) right to left. 4. The system of claim 1 , wherein the contour includes (i) a height of the ligature and (ii) a width of the ligature. 5. The system of claim 1 , wherein the height of the respective glyph is determined based on (i) a first distance from the top of the contour to a topmost darkened pixel in the column and (ii) a second distance from the bottom of the contour to a bottommost darkened pixel in the column. 6. The system of claim 5 , wherein (i) the first distance is determined based on a first scan from the top of the contour to the topmost darkened pixel in the column and (ii) the second distance is determined based on a second scan from the bottom of the contour to the bottommost darkened pixel in the column. 7. The system of claim 1 , wherein the pinch point is identified upon determining (i) a decrease in height of the respective glyph from a first column to a second column and (ii) an increase in height of the respective glyph from a second column to a third column. 8. The system of claim 7 , wherein the one or more processors are further configured to segment the contour vertically at the pinch point and separate the contour into separate contours. 9. The system of claim 1 , wherein the one or more processors are further configured to store a height of a glyph in a previously scanned column upon determining a change in height of the glyph from the previously scanned column to another glyph in a currently scanned column. 10. The system of claim 1 , wherein the one or more processors are further configured to convert the at least one digitized document image into monochrome. 11. A method for separating characters associated with ligatures in digitized document images, the method comprising: receiving, with a processor, at least one digitized document image including a plurality of characters; determining, with the processor, which of the plurality of characters in the at least one digitized document image are associated with ligatures; generating, with the processor, a contour around each of the ligatures, wherein the contour includes a pixelated version of the ligature, wherein pixels associated with glyphs of the ligature are darkened; and scanning, with the processor, each column of the contour; determining, with the processor, for each scanned column including at least one darkened pixel, a height of a respective glyph associated with the at least one darkened pixel by scanning from top and bottom of the column of the contour, wherein the scanning determines an imaginary vertical line so as to separate one or more characters from the plurality of characters, the imaginary vertical line based on a transition from a first point belonging to one or more darkened pixels of the column to a second point belonging to one or more darkened pixels of a different column, the transition defining a change from a first slope to a second slope between the column and different column; identifying, with the processor, a pinch point for the ligature based on the imaginary vertical line and a comparison between a plurality of adjacent scanned columns including at least one darkened pixel; removing, with the processor, the glyph associated with the pinch point; and separating, with the processor, the one or more characters associated with the ligature based on the removed glyph and the imaginary vertical line. 12. The method of claim 11 , wherein the columns are scanned from at least one selected from the group of: (i) from left to right and (ii) right to left. 13. The method of claim 11 , wherein the contour includes (i) a height of the ligature and (ii) a width of the ligature. 14. The method of claim 11 , wherein the height of the respective glyph is determined based on (i) a first distance from the top of the contour to a topmost darkened pixel in the column and (ii) a second distance from the bottom of the contour to a bottommost darkened pixel in the column. 15. The method of claim 14 , wherein (i) the first distance is determined based on a first scan from the top of the contour to the topmost darkened pixel in the column and (ii) the second distance is determined based on a second scan from the bottom of the contour to the bottommost darkened pixel in the column. 16. The method of claim 11 , wherein the pinch point is identified upon determining (i) a decrease in height of the respective glyph from a first column to a second column and (ii) an increase in height of the respective glyph from a second column to a third column. 17. The method of claim 16 , further comprising: segmenting, with the processor, the contour vertically at the pinch point and separating the contour into separate contours. 18. The method of claim 11 , further comprising: storing, with the processor, a height of a glyph associated with a previously scanned column in a memory database upon determining a change in height of the glyph from the previously scanned column to another glyph in a currently scanned column. 19. A system for separating characters associated with ligatures in digitized document images, the system comprising: a processor, wherein the processor is configured to: receive a contour including a ligature, wherein the ligature is pixelated, wherein pixels associated with the ligature are darkened; scan each column of the contour; determine, for each scanned column including at least one darkened pixel, a height of a respective glyph associated with the at least one darkened pixel by scanning from top and bottom of the column of the contour, wherein the scanning determines an imaginary vertical line so as to separa

Assignees

Inventors

Classifications

  • Removing patterns interfering with the pattern to be recognised, such as ruled lines or underlines · CPC title

  • G06V30/414Primary

    Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

  • using character size, text spacings or pitch estimation · CPC title

  • using recognition of characters or words · CPC title

  • Character recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10878271B2 cover?
Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06V30/414. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).