Multiple channels of rasterized content for page decomposition using machine learning
US-2021117666-A1 · Apr 22, 2021 · US
US12567276B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12567276-B2 |
| Application number | US-202318459460-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 1, 2023 |
| Priority date | Sep 1, 2023 |
| Publication date | Mar 3, 2026 |
| Grant date | Mar 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; and invoking a content-type specific content handler for each of the logical layers created. The layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components. The modified U-Net CNN is improved over traditional U-Net CNN with transformers at each layer to achieve high recovery rate.
Opening claim text (preview).
What is claimed is: 1 . A method for processing electronic documents, comprising: receiving an electronic document; recognizing one or more content components in the electronic document; identifying a content type for each of the recognized content components; and creating, by a layer separator, one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; wherein the layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components; wherein the modified U-Net convolutional neural network comprises four layers of encoders-decoders; wherein each encoder is configured to down-sample a feature map through convolution, activation, and pooling operations such that, during contraction, spatial information is reduced while feature information is increased, wherein each decoder is configured to up-sample a feature map through up-convolution and activation operations such that, during expansion, spatial information is increased while feature information is reduced; and wherein the ML model is trained using training data comprising a plurality of pairs of a generated document image and correspondingly labelled logical layers of content components of various content types that compose the generated document image, such that encoder and decoder operations using the four layers of encoders-decoders are learned by the ML model through the training, and that the feature map transformations performed during contraction and expansion correspond to labeled logical layers used in the training. 2 . The method of claim 1 , therein the content types comprise printed text content type, handwritten text content type, chop stamp content type, structured content type, barcode content type, and complex content type. 3 . The method of claim 1 , further comprising: extracting, by a printed text content handler, a region of interest (ROI) containing printed text for each of the content components of printed content type for further processing and disregard empty background space in the logical layer of printed text content type; depending on a language model chosen for printed text content handler, segmenting the ROI into one or more of sentences and characters; feeding the ROI as-is or the one or more of sentences and characters to an Optical Character Recognition (OCR) engine for performing a text recognition for the content component of printed text content type; extracting one or more attributes and a location on the electronic document page of the content component of printed text content type, wherein the attributes comprise at least typeface, font size, and color of the text, and author identification. 4 . The method of claim 1 , further comprising: extracting, by a handwritten text content handler, a ROI containing handwritten text or signature for each of the content components of handwritten text content type for further recognition as handwritten text or signature, and disregarding empty background space in the logical layer of handwritten text content type; if the ROI contains handwritten text, feeding the ROI as-is to an OCR engine for performing a text recognition for the content component of handwritten text content type; and if the ROI contains signature, feeding the ROI as-is to a signature verification engine for performing a signature verification comprising comparing the signature to records of authentic signatures stored in a signature database. 5 . The method of claim 1 , further comprising: localizing, by a chop stamp content handler, an outline shape of a chop stamp content component in the logical layer of chop stamp content type; cropping an image region of the outline shape of the chop stamp content component to obtain an chop stamp image; performing a text recognition of the chop stamp image to extract a text from the chop stamp image; and comparing and verifying the chop stamp image and the extracted text with records of chop stamp images stored in a chop stamp database. 6 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a structured content handler, one or more structured content components in the logical layer of structured content type using structure and shape analysis; and detecting a sub-type of each of the extracted structured content components, wherein the sub-types comprising a table, a list, an underlining, a highlighting, a box, and an artifact of a non-arbitrary shape. 7 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a barcode content handler, one or more barcode content components in the logical layer of barcode content type; and decoding each of the extracted barcode content components into machine readable data. 8 . The method of claim 1 , further comprising: detecting, recognizing, and extracting, by a complex content handler, one or more complex content components in the logical layer of complex content type; detecting a sub-type of each of the extracted complex content components; and invoking a context-sensitive content handling sub-module for each of the extracted complex content component according to its complex content sub-type. 9 . The method of claim 1 , further comprising: performing, by a multi-layer cross-referencing handler, context-sensitive cross referencing of two or more content components extracted from logical layers of different content types, comprising: analyzing locations, content types, sub-types, and attributes of the content components to be cross-referenced to determine the relationship between the content components; and determining a context significance from the determined relationship. 10 . An apparatus for processing electronic documents, comprising: a layer separator configured to: receive an electronic document; recognize one or more content components in the electronic document; identify a content type for each of the recognized content components; and create one or more logical layers from the recognized content components such that each of the logical layer contains only the content components of the same content type; wherein the layer separator comprises a machine learning (ML) model based on a modified U-Net convolutional neural network and trained to classify the content types of the content components; wherein the modified U-Net convolutional neural network comprises four layers of encoders-decoders; wherein each encoder is configured to down-sample a feature map through convolution, activation, and pooling operations such that, during contraction, spatial information is reduced while feature information is increased, wherein each decoder is configured to up-sample a feature map through up-convolution and activation operations such that, during expansion, spatial information is increased while feature information is reduced; and wherein the ML model is trained using training data comprising a plurality of pairs of a generated document image and correspondingly labelled logical layers of content components of various content types that compose the generated document image, such that encoder and decoder operations using the four layers of encoders-decoders are learned by the ML model through the training, and that the feature map transformations performed during contraction and expansion correspond to labeled logical layers used in the training. 11 . The apparatus of claim 10 , therein the content types comprise printed text content type, handwritten text content type, ch
characterised by the type of writing · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Recognition of logos · CPC title
Classification of content, e.g. text, photographs or tables · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.