What technology area does this patent fall under?

Primary CPC classification G06V30/414. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Line item detection in borderless tabular structured data

US12056948B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12056948-B2
Application number	US-202117379154-A
Country	US
Kind code	B2
Filing date	Jul 19, 2021
Priority date	Jul 19, 2021
Publication date	Aug 6, 2024
Grant date	Aug 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the plurality of text separators, the number of target clusters corresponding to a number of separator types. A processor provides indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: identifying, by one or more processors, a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table; classifying, by one or more processors, each text separator of the plurality of text separators into one of a plurality of target clusters, each target cluster corresponding to property information of a separator type; selecting, by one or more processors, a target group that includes the plurality of target clusters based on: determining, by one or more processors, whether each text separator of the plurality of separators separates text lines that meet a similarity threshold; and determining, by one or more processors, an accuracy level for the target group, the accuracy level based on a distribution of the text separators within the plurality of target clusters and the corresponding similarity threshold designation for each respective text separator within each respective target cluster; and providing, by one or more processors, indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying. 2. The computer-implemented method of claim 1 , wherein the property information comprises information about a selection of visual properties of the non-text regions defined by the plurality of text separators from the group consisting of: lines, bold lines, and dashed lines. 3. The computer-implemented method of claim 1 , wherein the target group is selected from a plurality of candidate groups comprising different numbers of candidate clusters, and selecting the target group further comprises: for each of a plurality of candidate groups comprising different numbers of candidate clusters: classifying, by one or more processors, each text separator of the plurality of text separators into a certain number of candidate clusters comprised in the candidate group; and determining, by one or more processors, an overall accuracy level for the candidate group based on respective distributions of the similarities for text separators classified in the certain number of candidate clusters; and selecting, by one or more processors, the target group from the plurality of candidate groups based on the overall accuracy levels determined for the plurality of candidate groups. 4. The computer-implemented method of claim 3 , wherein determining the overall accuracy level for the candidate group comprises: for each of the certain number of candidate clusters comprised in the candidate group, determining, by one or more processors, a distribution of the similarities for the text separators classified into the candidate cluster by: determining, by one or more processors, a first count of text separators that are classified into the candidate cluster and have the similarities above a predetermined threshold; and determining, by one or more processors, a second count of text separators that are classified into the candidate cluster and have the similarities below the predetermined threshold; for each of the certain number of candidate clusters, determining, by one or more processors, a cluster accuracy level for the candidate cluster based on the first count and the second count; and calculating, by one or more processors, the overall accuracy level for the candidate group by aggregating the cluster accuracy levels determined for the certain number of candidate clusters. 5. The computer-implemented method of claim 4 , wherein determining the cluster accuracy level for the candidate cluster based on the first count and the second count comprises: calculating, by one or more processors, a ratio of a higher one of the first and second counts to a sum of the first and second counts; and determining, by one or more processors, the cluster accuracy level based on the ratio. 6. The computer-implemented method of claim 3 , wherein selecting the target group from the plurality of candidate groups comprises: sorting, by one or more processors, the overall accuracy levels for the plurality of candidate groups; and selecting, by one or more processors, a candidate group with a highest overall accuracy level from the plurality of candidate groups, as the target group. 7. The computer-implemented method of claim 6 , wherein selecting a candidate group with a highest overall accuracy level further comprises selecting, by one or more processor, a candidate group comprising a lowest number of candidate clusters. 8. The computer-implemented method of claim 1 , further comprising: comparing, by one or more processor, the property information of the text separators classified into respective clusters of the plurality of target clusters with reference property information; and assigning, by one or more processor, the plurality of target clusters to be corresponding to each separator type, respectively, based on a result of the comparing. 9. The computer-implemented method of claim 1 , wherein providing the indication information comprises: assigning, by one or more processor, a first separator type to at least one of the plurality of text separators classified in a first cluster of the plurality of target clusters, the first target cluster aligned with the first separator type; assigning, by one or more processor, a second separator type to at least one of the plurality of text separators classified in a second cluster of the plurality of target clusters, the second target cluster aligned with the second separator type; and providing, by one or more processor, the indication information to at least indicate the first and second separator types assigned to the text separators classified in the first and second target clusters. 10. A computer program product comprising: one or more computer readable storage devices, and program instructions collectively stored on the one or more computer readable storage devices, the program instructions comprising: program instructions to identify a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table; program instructions to classify each text separator of the plurality of text separators into one of a plurality of target clusters, each target cluster corresponding to property information of a separator type; program instructions to select a target group that includes the plurality of target clusters based on: determining whether each text separator of the plurality of separators separates text lines that meet a similarity threshold; and determining an accuracy level for the target group, the accuracy level based on a distribution of the text separators within the plurality of target clusters and the corresponding similarity threshold designation for each respective text separator within each respective target cluster; and program instructions to provide indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying. 11. The computer program product of claim 10 , wherein the property information comprises information about a selection of visual properties of the non-text regions defined by the plurality of text separators from the group consisting of: lines, bold lines, and dashed lines. 12. The computer program product of claim 10 , wherein the target group is selected from a plurality of candidate groups comprising different numbers of candidate clusters, and selecting the target gro

Assignees

Inventors

Classifications

G06F18/23
Clustering techniques · CPC title
G06V30/10
Character recognition · CPC title
G06V30/413
Classification of content, e.g. text, photographs or tables · CPC title
G06V30/158
using character size, text spacings or pitch estimation · CPC title
G06V30/412
Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title

Patent family

Related publications grouped by family.

View patent family 84891064

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12056948B2 cover?: In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the pl…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06V30/414. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System And Method For Extracting Structured Information From Implicit Tables

Table Recognition in Portable Document Format Documents

Automated extraction of unstructured tables and semantic information from arbitrary documents

Detecting the bounds of borderless tables in fixed-format structured documents using machine learning

Detecting the bounds of borderless tables in fixed-format structured documents using machine learning

Borderless table detection engine

Frequently asked questions