Table Header Detection Using Global Machine Learning Features from Orthogonal Rows and Columns
US-2020097759-A1 · Mar 26, 2020 · US
US12597281B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12597281-B2 |
| Application number | US-202217947737-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 19, 2022 |
| Priority date | Sep 19, 2022 |
| Publication date | Apr 7, 2026 |
| Grant date | Apr 7, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving an image of a table; generating, using a first encoder of a table recognition machine learning model, an image feature vector including features extracted from the image of the table; generating, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table; generating, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, wherein the second decoder includes an optical character recognition (OCR) decoder; and determining, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features. 2 . The method of claim 1 , wherein the table structure further comprises a label associated with a cell of the table. 3 . The method of claim 1 , wherein the table recognition machine learning model includes a transformer network. 4 . The method of claim 1 , wherein the first decoder includes a self-attention layer. 5 . The method of claim 1 , wherein the method further comprises performing a task based on the table structure, the task including at least one of: table question answering, table fact verification, table formatting, table manipulation, and table captioning. 6 . The method of claim 1 , wherein determining the table structure further comprises determine a relationship between the cells of the table by at least performing a pair-wise comparison of features associated with the cells of the table. 7 . The method of claim 1 , wherein the method further comprises fine tuning the table recognition machine learning model by at least performing a bipartite matching between predictions generated by the table recognition machine learning model and a ground truth associated with the table. 8 . The method of claim 1 , wherein the method further comprises causing the table recognition machine learning model to execute a self-supervised pre-training task based on the table. 9 . The method of claim 8 , wherein the self-supervised pre-training task further comprises causing the table recognition machine learning model to predict semantic information based on a modified image of the table generated by at least masking a set of pixels of the image corresponding to a cell of the table. 10 . A non-transitory computer-readable medium storing executable instructions embodied thereon, which, when executed by a processing device, cause the processing device to perform operations comprising: determining, using a table recognition machine learning model, a first set of features based on an input image depicting a table; determining, based on the first set of features, a first set of predictions representing rows and columns associated with the table; determining, based on the first set of features, semantic information included in the table and a set of bounding boxes corresponding to the semantic information; determining, using the table recognition machine learning model, a set of labels associated with cells of the table based on the first set of feature, the first set of predictions, the semantic information, and the set bounding boxes; and causing the table recognition machine learning model to execute a self-supervised pre-training task based on the table by at least causing the table recognition machine learning model to predict at least a portion of the semantic information based on a modified image of the table generated by at least masking a set of pixels of the image corresponding to a cell of the table. 11 . The medium of claim 10 , wherein determining the first set of features is performed by an encoder of the table recognition machine learning model. 12 . The medium of claim 10 , wherein determining the set of labels is performed by a decoder of the table recognition machine learning model. 13 . The medium of claim 10 , wherein the self-supervised pre-training task further comprises causing the table recognition machine learning model to predict a modification to a row or a column of the table. 14 . The medium of claim 13 , wherein the modification includes at least one of: merging rows of the table, adding rows to the table, merging columns of the table, adding columns to the table, merging spans of the table, and adding spans to the table. 15 . The medium of claim 10 , wherein the self-supervised pre-training task further comprises causing the table recognition machine learning model to predict a rotated angle associated with the table based on the input image. 16 . A system comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: pre-training a table recognition machine learning model by at least causing the table recognition machine learning model to perform a set of self-supervision tasks to generate a pre-trained table recognition machine learning model, wherein the set of self-supervision tasks includes at least: predicting semantic values corresponding to a cell of a table based on a set of masked pixels included in an image of the table corresponding to the cell; predicting a set of column and a set of row associated with the table based on modifications to rows and columns of the table; and identifying a rotated angle associated with a table based on a rotated image of the table; training the pre-trained table recognition machine learning model by at least fine tuning the pre-trained table recognition machine learning model to generate a trained table recognition machine learning model; and using the trained table recognition machine learning model to perform a task. 17 . The system of claim 16 , wherein the table recognition machine learning model includes a transformer network. 18 . The system of claim 16 , wherein fine tuning the pre-trained table recognition machine learning model further comprises performing a bipartite matching between predictions generated by the pre-trained trained table recognition machine learning model and a ground truth associated with the table. 19 . The system of claim 16 , wherein fine tuning the pre-trained table recognition machine learning model further comprises performing binary classification based on a result of predicting the set of columns and the set of rows and a ground truth associated with the table. 20 . The system of claim 16 , wherein the task includes at least one of: table question answering, table fact verification, table formatting, table manipulation, and table captioning.
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
using context analysis, e.g. lexical, syntactic or semantic context · CPC title
Orientation detection or correction, e.g. rotation of multiples of 90 degrees · CPC title
Character recognition · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.