Image and semantic based table recognition

US12597281B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12597281-B2
Application numberUS-202217947737-A
CountryUS
Kind codeB2
Filing dateSep 19, 2022
Priority dateSep 19, 2022
Publication dateApr 7, 2026
Grant dateApr 7, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving an image of a table; generating, using a first encoder of a table recognition machine learning model, an image feature vector including features extracted from the image of the table; generating, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table; generating, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, wherein the second decoder includes an optical character recognition (OCR) decoder; and determining, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features. 2 . The method of claim 1 , wherein the table structure further comprises a label associated with a cell of the table. 3 . The method of claim 1 , wherein the table recognition machine learning model includes a transformer network. 4 . The method of claim 1 , wherein the first decoder includes a self-attention layer. 5 . The method of claim 1 , wherein the method further comprises performing a task based on the table structure, the task including at least one of: table question answering, table fact verification, table formatting, table manipulation, and table captioning. 6 . The method of claim 1 , wherein determining the table structure further comprises determine a relationship between the cells of the table by at least performing a pair-wise comparison of features associated with the cells of the table. 7 . The method of claim 1 , wherein the method further comprises fine tuning the table recognition machine learning model by at least performing a bipartite matching between predictions generated by the table recognition machine learning model and a ground truth associated with the table. 8 . The method of claim 1 , wherein the method further comprises causing the table recognition machine learning model to execute a self-supervised pre-training task based on the table. 9 . The method of claim 8 , wherein the self-supervised pre-training task further comprises causing the table recognition machine learning model to predict semantic information based on a modified image of the table generated by at least masking a set of pixels of the image corresponding to a cell of the table. 10 . A non-transitory computer-readable medium storing executable instructions embodied thereon, which, when executed by a processing device, cause the processing device to perform operations comprising: determining, using a table recognition machine learning model, a first set of features based on an input image depicting a table; determining, based on the first set of features, a first set of predictions representing rows and columns associated with the table; determining, based on the first set of features, semantic information included in the table and a set of bounding boxes corresponding to the semantic information; determining, using the table recognition machine learning model, a set of labels associated with cells of the table based on the first set of feature, the first set of predictions, the semantic information, and the set bounding boxes; and causing the table recognition machine learning model to execute a self-supervised pre-training task based on the table by at least causing the table recognition machine learning model to predict at least a portion of the semantic information based on a modified image of the table generated by at least masking a set of pixels of the image corresponding to a cell of the table. 11 . The medium of claim 10 , wherein determining the first set of features is performed by an encoder of the table recognition machine learning model. 12 . The medium of claim 10 , wherein determining the set of labels is performed by a decoder of the table recognition machine learning model. 13 . The medium of claim 10 , wherein the self-supervised pre-training task further comprises causing the table recognition machine learning model to predict a modification to a row or a column of the table. 14 . The medium of claim 13 , wherein the modification includes at least one of: merging rows of the table, adding rows to the table, merging columns of the table, adding columns to the table, merging spans of the table, and adding spans to the table. 15 . The medium of claim 10 , wherein the self-supervised pre-training task further comprises causing the table recognition machine learning model to predict a rotated angle associated with the table based on the input image. 16 . A system comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: pre-training a table recognition machine learning model by at least causing the table recognition machine learning model to perform a set of self-supervision tasks to generate a pre-trained table recognition machine learning model, wherein the set of self-supervision tasks includes at least: predicting semantic values corresponding to a cell of a table based on a set of masked pixels included in an image of the table corresponding to the cell; predicting a set of column and a set of row associated with the table based on modifications to rows and columns of the table; and identifying a rotated angle associated with a table based on a rotated image of the table; training the pre-trained table recognition machine learning model by at least fine tuning the pre-trained table recognition machine learning model to generate a trained table recognition machine learning model; and using the trained table recognition machine learning model to perform a task. 17 . The system of claim 16 , wherein the table recognition machine learning model includes a transformer network. 18 . The system of claim 16 , wherein fine tuning the pre-trained table recognition machine learning model further comprises performing a bipartite matching between predictions generated by the pre-trained trained table recognition machine learning model and a ground truth associated with the table. 19 . The system of claim 16 , wherein fine tuning the pre-trained table recognition machine learning model further comprises performing binary classification based on a result of predicting the set of columns and the set of rows and a ground truth associated with the table. 20 . The system of claim 16 , wherein the task includes at least one of: table question answering, table fact verification, table formatting, table manipulation, and table captioning.

Assignees

Inventors

Classifications

  • Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

  • using context analysis, e.g. lexical, syntactic or semantic context · CPC title

  • Orientation detection or correction, e.g. rotation of multiples of 90 degrees · CPC title

  • Character recognition · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12597281B2 cover?
In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image repr…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06V30/412. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).