Document entity extraction using machine-learned models

US12536376B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12536376-B2
Application numberUS-202318453236-A
CountryUS
Kind codeB2
Filing dateAug 21, 2023
Priority dateAug 21, 2023
Publication dateJan 27, 2026
Grant dateJan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for performing document entity extraction are described herein. The method can include receiving an inference document and a target schema. The method can also include generating one or more document inputs from the inference document and one or more schema inputs from the target schema. The method can further include, for each combination of the document input and schema input, obtaining one or more extraction inputs by generating a respective extraction input based on the combination, providing the respective extraction input to the machine-learned model, and receiving a respective output of the machine-learned model based on the respective extraction. The method can also include validating the extracted entity data based on reference spatial locations and inference spatial locations and outputting the validated extracted entity data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for performing document entity extraction, the method comprising: receiving, by a computing system comprising a processor, an inference document and a target schema, wherein the inference document comprises document data and one or more reference location tags respectively indicating one or more reference spatial locations of the document data within a rendering of the inference document; generating, by the computing system and based on an input dimension of a machine-learned model, one or more document inputs from the inference document and one or more schema inputs from the target schema; for each respective combination of the one or more document inputs and the one or more schema inputs, obtaining one or more extraction outputs by: generating, by the computing system, a respective prompt including one or more reference spatial locations based on the respective combination; providing, by the computing system, the respective prompt to the machine-learned model; and receiving, by the computing system, a respective output of the machine-learned model based on the respective prompt, wherein the respective output comprises entity data extracted according to the target schema and one or more inference location tags corresponding to one or more inference spatial locations of the entity data within the rendering of the inference document; validating, by the computing system, the extracted entity data based on the reference spatial locations and the inference spatial locations; and outputting, by the computing system, the validated extracted entity data. 2 . The computer-implemented method of claim 1 , wherein the inference document is based on an output of an optical character recognition system, and wherein the document data includes data representing optically-recognized characters in the rendering of the inference document. 3 . The computer-implemented method of claim 2 , the method comprising: receiving, by the computing system, an image input, wherein the image input is used to validate the output of the optical character recognition system. 4 . The computer-implemented method of claim 1 , wherein the inference document is an image representation of an electronic document. 5 . The computer-implemented method of claim 1 , wherein the one or more reference location tags respectively indicating one or more reference spatial locations of the document data within the rendering of the inference document are indicative of one or more bounding boxes containing a portion of the document data. 6 . The computer-implemented method of claim 1 , wherein validating the extracted entity data based on the reference spatial locations and the inference spatial locations comprises: performing, by the computing system, normalized string matching between the extracted entity data and document data at the reference spatial locations in the rendering of the inference document as indicated by the one or more inference location tags corresponding to one or more inference spatial locations of the entity data within the rendering of the inference document; determining, by the computing system, if the extracted entity data matches the document data; and in response to determining that the extracted entity data matches the document data, validating, by the computing system, the extracted entity data. 7 . The computer-implemented method of claim 6 , the method comprising: in response to determining that the extracted entity data does not match the document data: discarding, by the computing system, the extracted entity data. 8 . The computer-implemented method of claim 1 , the method comprising: dividing, by the computing system, the target schema into a plurality of independent branches, each branch of the plurality of independent branches representing a data entity and subentities of the data entity, wherein each independent branch of the plurality of independent branches is a schema input of the target schema. 9 . The computer-implemented method of claim 1 , wherein the prompt for the respective extraction input includes one or more extraction instructions. 10 . The computer-implemented method of claim 9 , wherein the one or more extraction instructions include a description of a spatial location. 11 . The computer-implemented method of claim 1 , the method comprising: retrieving, by the computing system, at least one document from a document corpus; and adding, by the computing system, at least a portion of the at least one document to the prompt for at least one document input and schema input combination. 12 . The computer-implemented method of claim 11 , wherein the prompt includes an extraction representation of one or more data entities extracted from the portion of the at least one document. 13 . The computer-implemented method of claim 12 , wherein determining the representative value comprises determining a majority output from the plurality of outputs. 14 . The computer-implemented method of claim 13 , wherein a confidence score is generated based on the majority output and the plurality of outputs. 15 . The computer-implemented method of claim 12 , wherein the representative value is determined based at least in part on one or more received scores from the model. 16 . A computing system for performing document entity extraction, the computing system comprising: one or more processors; and a non-transitory, computer-readable medium comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising: receiving an inference document and a target schema, wherein the inference document comprises document data and one or more reference location tags respectively indicating one or more reference spatial locations of the document data within a rendering of the inference document; generating, based on an input dimension of a machine-learned model, one or more document inputs from the inference document and one or more schema inputs from the target schema; for each respective combination of the one or more document inputs and the one or more schema inputs, obtaining one or more extraction outputs by: generating a respective prompt including one or more reference spatial locations based on the respective combination; providing the respective prompt to the machine-learned model; and receiving a respective output of the machine-learned model based on the respective prompt, wherein the respective output comprises entity data extracted according to the target schema and one or more inference location tags corresponding to one or more inference spatial locations of the entity data within the rendering of the inference document; validating the extracted entity data based on the reference spatial locations and the inference spatial locations; and outputting the validated extracted entity data. 17 . The computing system of claim 16 , wherein validating the extracted entity data based on the reference spatial locations and the inference spatial locations comprises: performing normalized string matching between the extracted entity data and document data at the reference spatial locations in the rendering of the inference document as indicated by the one or more inference location tags corresponding to one or more inference spatial locations of the entity data within the rendering of the inference document; determining if the extracted entity data matches the document data; and in response to determinin

Assignees

Inventors

Classifications

  • Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • G06F40/295Primary

    Named entity recognition · CPC title

  • Character recognition · CPC title

  • Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536376B2 cover?
Systems and methods for performing document entity extraction are described herein. The method can include receiving an inference document and a target schema. The method can also include generating one or more document inputs from the inference document and one or more schema inputs from the target schema. The method can further include, for each combination of the document input and schema in…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).