Querying semantic data from unstructured documents
US-2022092328-A1 · Mar 24, 2022 · US
US11514489B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11514489-B2 |
| Application number | US-202117142865-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 6, 2021 |
| Priority date | Jan 6, 2021 |
| Publication date | Nov 29, 2022 |
| Grant date | Nov 29, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are various embodiments for targeted document information extraction. An embodiment operates by receiving a document associated with a particular customer of a plurality of customers. It is determined whether to use a global processor or template processor to analyze the document based on whether one or more customer templates are associated with the particular customer. Which of the one or more templates associated with the particular customer correspond to the document is identified. The document is compared to the identified template associated with the customer. Information is extracted from the document based on the identified template and the identified plurality of variations. The extracted information for the document is output.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving a document associated with a particular customer of a plurality of customers, wherein the document comprises a plurality of fields and corresponding data in an image format; determining whether to use a global processor or a template processor to analyze the document based on comparing one or more features extracted from the document to one or more features of each of a plurality of templates, wherein each of the plurality of templates corresponds to the template processor, wherein the global processor is configured to analyze any document across the plurality of customers, and wherein the template processor is configured to analyze a narrower range of documents than the global processor using one of the plurality of templates; routing the document to the template processor based on the comparing; comparing, by the template processor, the document to the identified template associated with the customer, wherein the template processor is configured to identify a plurality of variations on the identified template; extracting, by the template processor, information from the document based on the identified template and the identified plurality of variations, wherein the extracted information comprises the plurality of fields and corresponding data in a textual format; and outputting, by the template processor, the extracted information for the document. 2. The method of claim 1 , wherein the document comprises a scanned document of an invoice or order form. 3. The method of claim 1 , wherein the receiving comprises: generating a character grid representation of the document comprising a location and corresponding character index information for the location. 4. The method of claim 1 , further comprising: receiving, by the template processor prior to the receiving the document, the identified template comprising a base document and an annotated version of the base document. 5. The method of claim 1 , wherein each of the plurality of customers have access to their own template processor to analyze known customer-specific documents. 6. The method of claim 5 , wherein each of the plurality of customers also have access to the global processor to analyze either the known customer-specific documents and any unknown documents. 7. The method of claim 1 , wherein a first one of the one or more templates of the particular customer corresponds to a first template processor, wherein a second one of the one or more templates of the particular customer corresponds to a second template processor, and wherein either the first template processor or the second template processor is used to analyze the document. 8. A system, comprising: a memory; and at least one processor coupled to the memory and configured to perform instructions that cause the at least one processor to perform operations comprising: receiving a document associated with a particular customer of a plurality of customers, wherein the document comprises a plurality of fields and corresponding data in an image format; determining whether to use a global processor or a template processor to analyze the document based on comparing one or more features extracted from the document to one or more features of each of a plurality of templates, wherein each of the plurality of templates corresponds to the template processor, wherein the global processor is configured to analyze any document across the plurality of customers, and wherein the template processor is configured to analyze a narrower range of documents than the global processor using one of the plurality of templates; routing the document to the template processor based on the comparing; comparing, by the template processor, the document to the identified template associated with the customer, wherein the template processor is configured to identify a plurality of variations on the identified template; extracting, by the template processor, information from the document based on the identified template and the identified plurality of variations, wherein the extracted information comprises the plurality of fields and corresponding data in a textual format; and outputting, by the template processor, the extracted information for the document. 9. The system of claim 8 , wherein the document comprises a scanned document of an invoice or order form. 10. The system of claim 8 , wherein the receiving comprises: generating a character grid representation of the document comprising a location and corresponding character index information for the location. 11. The system of claim 8 , wherein the operations further comprise: receiving, by the template processor prior to the receiving the document, the identified template comprising a base document and an annotated version of the base document. 12. The system of claim 8 , wherein each of the plurality of customers have access to their own template processor to analyze known customer-specific documents. 13. The system of claim 12 , wherein each of the plurality of customers also have access to the global processor to analyze either the known customer-specific documents and any unknown documents. 14. The system of claim 8 , wherein a first one of the one or more templates of the particular customer corresponds to a first template processor, wherein a second one of the one or more templates of the particular customer corresponds to a second template processor, and wherein either the first template processor or the second template processor is used to analyze the document. 15. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: receiving a document associated with a particular customer of a plurality of customers, wherein the document comprises a plurality of fields and corresponding data in an image format; determining whether to use a global processor or a template processor to analyze the document based on comparing one or more features extracted from the document to one or more features of each of a plurality of templates, wherein each of the plurality of templates corresponds to the template processor, wherein the global processor is configured to analyze any document across the plurality of customers, and wherein the template processor is configured to analyze a narrower range of documents than the global processor using one of the plurality of templates; routing the document to the template processor based on the comparing; comparing, by the template processor, the document to the identified template associated with the customer, wherein the template processor is configured to identify a plurality of variations on the identified template; extracting, by the template processor, information from the document based on the identified template and the identified plurality of variations, wherein the extracted information comprises the plurality of fields and corresponding data in a textual format; and outputting, by the template processor, the extracted information for the document. 16. The non-transitory computer-readable device of claim 15 , wherein the document comprises a scanned document of an invoice or order form. 17. The non-transitory computer-readable device of claim 15 , wherein the receiving comprises: generating a character grid representation of the document comprising a location and corresponding character index information for the location. 18. The non-transitory computer-readab
Billing or invoicing · CPC title
replenishment orders; recurring orders · CPC title
Templates · CPC title
Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title
Annotation, e.g. comment data or footnotes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.