Method and apparatus for document analysis through layer separation by machine learning with cross-layer reasoning

US12567276B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12567276-B2
Application numberUS-202318459460-A
CountryUS
Kind codeB2
Filing dateSep 1, 2023
Priority dateSep 1, 2023
Publication dateMar 3, 2026
Grant dateMar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; and invoking a content-type specific content handler for each of the logical layers created. The layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components. The modified U-Net CNN is improved over traditional U-Net CNN with transformers at each layer to achieve high recovery rate.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; and creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; wherein the layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components; wherein the modified U-Net convolutional neural network comprises four layers of encoders-decoders; wherein each encoder is configured to down-sample a feature map through convolution, activation, and pooling operations such that, during contraction, spatial information is reduced while feature information is increased, wherein each decoder is configured to up-sample a feature map through up-convolution and activation operations such that, during expansion, spatial information is increased while feature information is reduced; and wherein the ML model is trained using training data comprising a plurality of pairs of a generated document image and correspondingly labelled logical layers of content components of various content types that compose the generated document image, such that encoder and decoder operations using the four layers of encoders-decoders are learned by the ML model through the training, and that the feature map transformations performed during contraction and expansion correspond to labeled logical layers used in the training. 2 . The method of claim 1 , therein the content types comprise printed text content type, handwritten text content type, chop stamp content type, structured content type, barcode content type, and complex content type. 3 . The method of claim 1 , further comprising: extracting, by a printed text content handler, a region of interest (ROI) containing printed text for each of the content components of printed content type for further processing and disregard empty background space in the logical layer of printed text content type; depending on a language model chosen for printed text content handler, segmenting the ROI into one or more of sentences and characters; feeding the ROI as-is or the one or more of sentences and characters to an Optical Character Recognition (OCR) engine for performing a text recognition for the content component of printed text content type; extracting one or more attributes and a location on the electronic document page of the content component of printed text content type, wherein the attributes comprise at least typeface, font size, and color of the text, and author identification. 4 . The method of claim 1 , further comprising: extracting, by a handwritten text content handler, a ROI containing handwritten text or signature for each of the content components of handwritten text content type for further recognition as handwritten text or signature, and disregarding empty background space in the logical layer of handwritten text content type; if the ROI contains handwritten text, feeding the ROI as-is to an OCR engine for performing a text recognition for the content component of handwritten text content type; and if the ROI contains signature, feeding the ROI as-is to a signature verification engine for performing a signature verification comprising comparing the signature to records of authentic signatures stored in a signature database. 5 . The method of claim 1 , further comprising: localizing, by a chop stamp content handler, an outline shape of a chop stamp content component in the logical layer of chop stamp content type; cropping an image region of the outline shape of the chop stamp content component to obtain an chop stamp image; performing a text recognition of the chop stamp image to extract a text from the chop stamp image; and comparing and verifying the chop stamp image and the extracted text with records of chop stamp images stored in a chop stamp database. 6 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a structured content handler, one or more structured content components in the logical layer of structured content type using structure and shape analysis; and detecting a sub-type of each of the extracted structured content components, wherein the sub-types comprising a table, a list, an underlining, a highlighting, a box, and an artifact of a non-arbitrary shape. 7 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a barcode content handler, one or more barcode content components in the logical layer of barcode content type; and decoding each of the extracted barcode content components into machine readable data. 8 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a complex content handler, one or more complex content components in the logical layer of complex content type; detecting a sub-type of each of the extracted complex content components; and invoking a context-sensitive content handling sub-module for each of the extracted complex content component according to its complex content sub-type. 9 . The method of claim 1 , further comprising: performing, by a multi-layer cross-referencing handler, context-sensitive cross referencing of two or more content components extracted from logical layers of different content types, comprising: analyzing locations, content types, sub-types, and attributes of the content components to be cross-referenced to determine the relationship between the content components; and determining a context significance from the determined relationship. 10 . An apparatus for processing electronic documents, comprising: a layer separator configured to: receive an electronic document; recognize one or more content components in the electronic document; identify a content type for each of the recognized content components; and create one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; wherein the layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components; wherein the modified U-Net convolutional neural network comprises four layers of encoders-decoders; wherein each encoder is configured to down-sample a feature map through convolution, activation, and pooling operations such that, during contraction, spatial information is reduced while feature information is increased, wherein each decoder is configured to up-sample a feature map through up-convolution and activation operations such that, during expansion, spatial information is increased while feature information is reduced; and wherein the ML model is trained using training data comprising a plurality of pairs of a generated document image and correspondingly labelled logical layers of content components of various content types that compose the generated document image, such that encoder and decoder operations using the four layers of encoders-decoders are learned by the ML model through the training, and that the feature map transformations performed during contraction and expansion correspond to labeled logical layers used in the training. 11 . The apparatus of claim 10 , therein the content types comprise printed text content type, handwritten text content type, ch

Assignees

Inventors

Classifications

  • characterised by the type of writing · CPC title

  • G06V10/774Primary

    Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Recognition of logos · CPC title

  • Classification of content, e.g. text, photographs or tables · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12567276B2 cover?
A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content …
Who is the assignee on this patent?
Hong Kong Applied Science & Tech Research Inst Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).