Outsourcing Document-Transformation Tasks while Protecting Sensitive Information
US-2016063269-A1 · Mar 3, 2016 · US
US12573227B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12573227-B2 |
| Application number | US-202117160080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 27, 2021 |
| Priority date | Oct 5, 2020 |
| Publication date | Mar 10, 2026 |
| Grant date | Mar 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Improved techniques to access content from documents in an automated fashion. The improved techniques permit extraction of data from documents, namely, images of documents. The extraction processing can be hierarchical, such as being performed in multiple levels (i.e., multi-leveled). At an upper level, numerous different objects within a document can be detected along with positional data for the objects and can be categorized based on a type of object. Then, at lower levels, the different objects can be processed differently depending on the type of object. As a result, data extraction from the document can be performed with greater reliability and precision.
Opening claim text (preview).
What is claimed is: 1 . A document extraction system for extracting content from documents, comprising: an object detection component that receives a document image for a document to be processed, determines a plurality of object blocks of the document, and outputs the plurality of object blocks, each of the object blocks having a block type that denotes a type of object block; a plurality of data extraction components, each of the data extraction components being associated with extraction of data from object blocks with different block types, the different block types including at least a key-value block, a key information block, and a table block; an object classifier operatively connected to the object detection component, the object classifier configured to direct different ones of the object blocks of the document to different ones of the data extraction components based on at least the block type corresponding to the different ones of the object blocks; and an aggregator operatively connected to the data extraction components, the aggregator configured to receive extracted data from the data extraction components, combine the received extracted data from the data extraction components into a single structured data file containing the resulting data extraction for the document, wherein at least one of the data extraction components uses a Natural Language Processing Model trained for data extraction of data from object blocks of a particular block type. 2 . A document extraction system as recited in claim 1 , wherein at least one of the different block types includes a graphic block. 3 . A document extraction system as recited in claim 2 , wherein at least one of the data extraction components uses a first Natural Language Processing Model trained for data extraction of data from object blocks of a first particular block type, and wherein at least one of the data extraction components uses a second Natural Language Processing Model trained for data extraction of data from object blocks of a second particular block type. 4 . A document extraction system as recited in claim 1 , wherein at least one of the data extraction components uses a first Natural Language Processing Model trained for data extraction of data from object blocks of a first particular block type, and wherein at least one of the data extraction components uses a second Natural Language Processing Model trained for data extraction of data from object blocks of a second particular block type. 5 . A computer-implemented method for extracting content from documents, the method comprising: receiving a document to be processed, the document being received as a digital image; determining object blocks in the document, each of the detected object blocks being denoted by an object type; determining data extraction processing to be perform for a given one of each of the detected object blocks based on the object type associated therewith; performing the determined data extraction on the detected object blocks as determined based on the object type, wherein the detected object blocks that have different object types are processed differently, the different block types including at least two or more of a key-value block, a key information block, and a table block, the performing of the determined data extraction separately produces extracted data from the detected object blocks; and combining the data extraction separately produced from the detected object blocks into a data extraction file for the document, the data extraction file being a single structured data file containing the resulting data extraction from the detected object blocks in the document, wherein the determined data extraction performed for at least one on the object types uses a machine learned model, and wherein the determined data extraction performed for at least one of the object types uses a first machine learned model, and wherein the determined data extraction performed for at least another one of the object types uses a second machine learned model. 6 . A computer-implemented method as recited in claim 5 , wherein the first machined learned model and the second machined learned model are both based on an NLP model. 7 . A computer-implemented method as recited in claim 5 , wherein the determined data extraction performed for at least one on the object types uses artificial intelligence. 8 . A computing system for robotic process automation, comprising: a document extraction sub-system for extracting content from documents, comprising: an object detection component that receives a document image for a document to be processed, detects a plurality of object blocks within the document, and outputs the plurality of object blocks, each of the object blocks having a block type that denotes a type of object block; a plurality of data extraction components, each of the data extraction components being associated with extraction of data from object blocks with different block types, the different block types including a key-value block, a key information block, a table block, or a graphic block; an object classifier operatively connected to the object detection component, the object classifier configured to direct different ones of the object blocks of the document to different ones of the data extraction components based on at least the block type corresponding to the different ones of the object blocks; and an aggregator operatively connected to the data extraction components, the aggregator configured to receive extracted data from the different ones of the data extraction components, combine the received extracted data from the different ones of the data extraction components into a resulting data extraction for the document, and output a single structured data file containing the resulting data extraction for the document. 9 . A computing system as recited in claim 8 , wherein the object blocks detected from the document include at least one table block and at least one key-value block. 10 . A computing system as recited in claim 8 , wherein the object blocks detected from the document include at least a table block, a key-value block, and a key information block. 11 . A computing system as recited in claim 8 , wherein each of the object blocks detected from the document includes the object type, a bounding box for the object block, and a position reference of the object block on the document. 12 . A non-transitory computer readable medium including computer program code tangibly stored therein for extracting content from documents, the computer readable medium comprising: computer program code for receiving a document to be processed, the document being received as a digital image; computer program code for determining object blocks in the document, each of the detected object blocks being denoted by an object type, wherein a plurality of the detected object blocks are other than text blocks, the determined data extraction is separately performed to obtained extracted data; computer program code for determining data extraction processing to be perform for a given one of each of the detected object blocks based on the object type associated therewith; computer program code for performing the determined data extraction on the detected object blocks as determined based on the object type, wherein for each of more than one of the detected object blocks, the determined data extraction is separately performed to obtain extracted data, and wherein the detected object blocks that have different object types are processed differently; and computer program code for aggregating the extracted data for each of t
Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition (scanning, transmission or reproduction of documents or the like H04N1/00) · CPC title
Handling natural language data (speech analysis or synthesis, speech recognition G10L) · CPC title
Classification techniques · CPC title
Combinations of networks · CPC title
using recognition of characters or words · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.