Who is the assignee on this patent?

Hong Kong Applied Science & Tech Research Inst Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V10/774. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for document analysis through layer separation by machine learning with cross-layer reasoning

US12567276B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12567276-B2
Application number	US-202318459460-A
Country	US
Kind code	B2
Filing date	Sep 1, 2023
Priority date	Sep 1, 2023
Publication date	Mar 3, 2026
Grant date	Mar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; and invoking a content-type specific content handler for each of the logical layers created. The layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components. The modified U-Net CNN is improved over traditional U-Net CNN with transformers at each layer to achieve high recovery rate.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; and creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; wherein the layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components; wherein the modified U-Net convolutional neural network comprises four layers of encoders-decoders; wherein each encoder is configured to down-sample a feature map through convolution, activation, and pooling operations such that, during contraction, spatial information is reduced while feature information is increased, wherein each decoder is configured to up-sample a feature map through up-convolution and activation operations such that, during expansion, spatial information is increased while feature information is reduced; and wherein the ML model is trained using training data comprising a plurality of pairs of a generated document image and correspondingly labelled logical layers of content components of various content types that compose the generated document image, such that encoder and decoder operations using the four layers of encoders-decoders are learned by the ML model through the training, and that the feature map transformations performed during contraction and expansion correspond to labeled logical layers used in the training. 2 . The method of claim 1 , therein the content types comprise printed text content type, handwritten text content type, chop stamp content type, structured content type, barcode content type, and complex content type. 3 . The method of claim 1 , further comprising: extracting, by a printed text content handler, a region of interest (ROI) containing printed text for each of the content components of printed content type for further processing and disregard empty background space in the logical layer of printed text content type; depending on a language model chosen for printed text content handler, segmenting the ROI into one or more of sentences and characters; feeding the ROI as-is or the one or more of sentences and characters to an Optical Character Recognition (OCR) engine for performing a text recognition for the content component of printed text content type; extracting one or more attributes and a location on the electronic document page of the content component of printed text content type, wherein the attributes comprise at least typeface, font size, and color of the text, and author identification. 4 . The method of claim 1 , further comprising: extracting, by a handwritten text content handler, a ROI containing handwritten text or signature for each of the content components of handwritten text content type for further recognition as handwritten text or signature, and disregarding empty background space in the logical layer of handwritten text content type; if the ROI contains handwritten text, feeding the ROI as-is to an OCR engine for performing a text recognition for the content component of handwritten text content type; and if the ROI contains signature, feeding the ROI as-is to a signature verification engine for performing a signature verification comprising comparing the signature to records of authentic signatures stored in a signature database. 5 . The method of claim 1 , further comprising: localizing, by a chop stamp content handler, an outline shape of a chop stamp content component in the logical layer of chop stamp content type; cropping an image region of the outline shape of the chop stamp content component to obtain an chop stamp image; performing a text recognition of the chop stamp image to extract a text from the chop stamp image; and comparing and verifying the chop stamp image and the extracted text with records of chop stamp images stored in a chop stamp database. 6 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a structured content handler, one or more structured content components in the logical layer of structured content type using structure and shape analysis; and detecting a sub-type of each of the extracted structured content components, wherein the sub-types comprising a table, a list, an underlining, a highlighting, a box, and an artifact of a non-arbitrary shape. 7 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a barcode content handler, one or more barcode content components in the logical layer of barcode content type; and decoding each of the extracted barcode content components into machine readable data. 8 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a complex content handler, one or more complex content components in the logical layer of complex content type; detecting a sub-type of each of the extracted complex content components; and invoking a context-sensitive content handling sub-module for each of the extracted complex content component according to its complex content sub-type. 9 . The method of claim 1 , further comprising: performing, by a multi-layer cross-referencing handler, context-sensitive cross referencing of two or more content components extracted from logical layers of different content types, comprising: analyzing locations, content types, sub-types, and attributes of the content components to be cross-referenced to determine the relationship between the content components; and determining a context significance from the determined relationship. 10 . An apparatus for processing electronic documents, comprising: a layer separator configured to: receive an electronic document; recognize one or more content components in the electronic document; identify a content type for each of the recognized content components; and create one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; wherein the layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components; wherein the modified U-Net convolutional neural network comprises four layers of encoders-decoders; wherein each encoder is configured to down-sample a feature map through convolution, activation, and pooling operations such that, during contraction, spatial information is reduced while feature information is increased, wherein each decoder is configured to up-sample a feature map through up-convolution and activation operations such that, during expansion, spatial information is increased while feature information is reduced; and wherein the ML model is trained using training data comprising a plurality of pairs of a generated document image and correspondingly labelled logical layers of content components of various content types that compose the generated document image, such that encoder and decoder operations using the four layers of encoders-decoders are learned by the ML model through the training, and that the feature map transformations performed during contraction and expansion correspond to labeled logical layers used in the training. 11 . The apparatus of claim 10 , therein the content types comprise printed text content type, handwritten text content type, ch

Assignees

Hong Kong Applied Science & Tech Research Inst Co Ltd

Inventors

Classifications

G06V30/22
characterised by the type of writing · CPC title
G06V10/774Primary
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V2201/09
Recognition of logos · CPC title
G06V30/413
Classification of content, e.g. text, photographs or tables · CPC title
G06V10/82
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 94773238

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12567276B2 cover?: A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content …
Who is the assignee on this patent?: Hong Kong Applied Science & Tech Research Inst Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multiple channels of rasterized content for page decomposition using machine learning

Automated classification and interpretation of life science documents

Two-dimensional document processing

Automatic Hierarchical Classification and Metadata Identification of Document Using Machine Learning and Fuzzy Matching

System and method for OCR output verification

Frequently asked questions