Layout-Aware Multimodal Pretraining for Multimodal Document Understanding
US-2023222285-A1 · Jul 13, 2023 · US
US12272168B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12272168-B2 |
| Application number | US-202218046831-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 14, 2022 |
| Priority date | Apr 13, 2022 |
| Publication date | Apr 8, 2025 |
| Grant date | Apr 8, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments of the present disclosure provide methods, apparatus, systems, computing devices, computing entities, and/or the like for processing document classification system outputs, wherein classification routine iterations are performed using masked document data objects comprising one or more masked text blocks. Text block importance score for text blocks are generated and compared to generate predictive data output comprising text blocks determined to be the most influential in classifying the document data objects with respect to one or more classification labels.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for processing document classification system outputs, the computer-implemented method comprising: generating, using one or more processors, an unmasked label probability score, of one or more unmasked label probability scores, for each of one or more classification labels based at least in part on one or more document data objects; for each document data object of the one or more document data objects: segmenting, using the one or more processors, the document data object into a plurality of text blocks; performing, using the one or more processors and a document classification machine learning model, a classification of the document data object via one or more classification routine iterations, wherein each of the one or more classification routine iterations is configured to: (i) generate one or more masked text blocks by masking one or more text blocks of the plurality of text blocks, (ii) generate, using the document classification machine learning model, per-masked document classification of the document data object, based at least in part on the masking of the one or more masked text blocks, and (iii) generate one or more per-iteration masked label probability scores based at least in part on the one or more masked text blocks absent from the document data object, wherein each of the one or more per-iteration masked label probability scores correspond to a particular classification label of the one or more classification labels and is associated with one or more of the one or more masked text blocks; for each masked text block of the one or more masked text blocks: generating, using the one or more processors, one or more per-label text block importance scores based at least in part on a corresponding one of the one or more unmasked label probability scores and each of the one or more per-iteration masked label probability scores associated with the masked text block; generating, using the one or more processors, a predictive data output for the document data object based at least in part on the one or more per-label text block importance scores; and performing, using the one or more processors, one or more prediction-based actions based at least in part on the predictive data output for the one or more document data objects. 2. The computer-implemented method of claim 1 , wherein generating a per-label text block importance score of the one or more per-label text block importance scores for the masked text block comprises: identifying, for a selected classification label of the one or more classification labels, each of the one or more per-iteration masked label probability scores associated with the masked text block; generating, a per-label masked label probability score associated with the masked text block with respect to the selected classification label based at least in part on each of the one or more per-iteration masked label probability score associated with the masked text block; and generating the per-label text block importance score for the selected classification label based at least in part on comparing the unmasked label probability score for the selected classification label with the per-label masked label probability score. 3. The computer-implemented method of claim 1 , wherein the one or more classification routine iterations comprise a required number of classification routine iterations that is determined based at least in part on an expected text block masking count and a text block masking probability. 4. The computer-implemented method of claim 1 , wherein masking the one or more text blocks of the plurality of text blocks comprises randomly selecting the one or more text blocks of the plurality of text blocks and masking the one or more text blocks. 5. The computer-implemented method of claim 1 , wherein each of the plurality of text blocks comprises a sequence of words represented as a sequence of tokens. 6. The computer-implemented method of claim 1 , wherein segmenting the document data object into the plurality of text blocks comprises selecting a text block size measure and segmenting the document data object based at least in part on the text block size measure. 7. The computer-implemented method of claim 1 , wherein the one or more document data objects comprise a long document. 8. The computer-implemented method of claim 1 , wherein the document classification machine learning model is a multi-label classification machine learning model. 9. The computer-implemented method of claim 1 , wherein the predictive data output comprises one or more selected ones of the one or more text blocks based at least in part on the one or more per-label text block importance scores for each of the one or more selected ones of the one or more text blocks. 10. A system for processing document classification system outputs, the system comprising one or more processors and at least one memory storing processor executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: generating an unmasked label probability score, of one or more unmasked label probability scores, for each of one or more classification labels based at least in part on one or more document data objects; for each document data object of the one or more document data objects: segmenting the document data object into a plurality of text blocks; performing, using a document classification machine learning model, a classification of the document data object via one or more classification routine iterations, wherein each of the one or more classification routine iterations is configured to: (i) generate one or more masked text blocks by masking one or more text blocks of the plurality of text blocks, (ii) generate, using the document classification machine learning model, per-masked document classification of the document data object, based at least in part on the one or more masked text blocks, and (iii) generate one or more per-iteration masked label probability scores based at least in part on the one or more masked text blocks absent from the document data object, wherein each of the one or more per-iteration masked label probability scores correspond to a particular classification label of the one or more classification labels and is associated with one or more of the one or more masked text blocks; for each masked text block of the one or more masked text blocks: generating one or more per-label text block importance scores based at least in part on a corresponding one of the one or more unmasked label probability scores and each of the one or more per-iteration masked label probability scores associated with the masked text block; generating a predictive data output for the document data object based at least in part on the one or more per-label text block importance scores; and performing one or more prediction-based actions based at least in part on the predictive data output for the one or more document data objects. 11. The system of claim 10 , wherein generating a per-label text block importance score of the one or more per-label text block importance scores for the masked text block comprises: identifying, for a selected classification label of the one or more classification labels, each of the one or more per-iteration masked label probability scores associated with the masked text block; generating, a per-label masked label probability score associated with the masked text block with respect to the selected classification label based at least in part on each of the one or more per-iteration masked label probability score associated with
Lexical analysis, e.g. tokenisation or collocates · CPC title
Classification of content, e.g. text, photographs or tables · CPC title
Classification techniques · CPC title
Analysis of document content (recognition of printed characters based on code marks G06V30/224) · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.