Systems and methods for identifying form fields

US10482174B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10482174-B1
Application numberUS-201816163537-A
CountryUS
Kind codeB1
Filing dateOct 17, 2018
Priority dateOct 17, 2018
Publication dateNov 19, 2019
Grant dateNov 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems and methods for generating synthetic documents. In one implementation, a system for generating synthetic data from a plurality of documents may include at least one processor and at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the system to: receive a plurality of documents, individual documents of the plurality of documents having a same document type; generate a distribution of values for a corresponding pixel in the individual documents of plurality of documents; determine, based on the distributions, one or more common features of the plurality of documents; determine, based on the comparison, one or more input fields; generate a template including the one or more common features and the one or more input fields; and input synthetic data into the one or more input fields of the template thereby generating a plurality of synthetic documents.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for generating a synthetic document from a plurality of documents comprising: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the system to perform operations comprising: receiving a plurality of documents, individual documents of the plurality of documents having a same document type; generating a distribution of values for a pixel at a corresponding location in the individual documents of plurality of documents; determining, based on the distribution, one or more common features of the plurality of documents; determining, based on a comparison of a pixel at the corresponding location in an individual document to the distribution, one or more input fields; generating a template including the one or more common features and the one or more input fields; and inputting synthetic data into the one or more input fields of the template thereby generating at least one of a plurality of synthetic documents. 2. The system of claim 1 , wherein the operations further comprise at least one of aligning or rotating one or more of the plurality of documents. 3. The system of claim 1 , wherein the operations further comprise generating an expected background based on the one or more common features. 4. The system of claim 1 , wherein the document type comprises at least one of a loan application, an account application, a public document, an identification card, or a passport. 5. The system of claim 1 , wherein the operations further comprise performing a background subtraction operation on one or more documents of the plurality of documents. 6. The system of claim 1 , wherein a corresponding pixel comprises a pixel having the same location in the plurality of documents. 7. The system of claim 1 , wherein determining one or more common features comprises identifying a set of pixels, the pixels being associated with a distribution having less than a threshold standard deviation. 8. The system of claim 1 , wherein determining one or more input fields comprises identifying a set of pixels, the pixels being associated with a distribution having greater than a threshold standard deviation. 9. The system of claim 8 , wherein an input field comprises an outer boundary of a set of adjacent pixels. 10. The system of claim 1 , wherein determining one or more input fields comprises performing pattern recognition. 11. The system of claim 10 , wherein pattern recognition comprises identifying at least one of a check box, a line, a box, or a prompt. 12. A computer-implemented method for generating synthetic documents, comprising: receiving, by a processor, a plurality of documents, individual documents of the plurality of documents having a same document type; generating a distribution of values for a pixel at a corresponding location in individual documents of the plurality of documents; determining, based on the distribution, one or more common features of the plurality of documents; determining, based on a comparison of a pixel at the corresponding location in an individual document to the distribution, one or more input fields; generating a template including the one or more common features and the one or more input fields; and inputting synthetic data into the one or more input fields of the template thereby generating at least one of a plurality of synthetic documents. 13. The method of claim 12 , further comprising generating an expected background based on the one or more common features. 14. The method of claim 13 , wherein the expected background comprises a plurality of pixels, a pixel corresponding to a pixel position in the plurality of documents. 15. The method of claim 14 , wherein each of the plurality of pixels comprises a mean of the distribution associated with the corresponding pixel. 16. The method of claim 12 , further comprising determining metadata associated with one or more documents of the plurality of documents. 17. The method of claim 16 , wherein determining metadata comprises determining whether one or more input fields contains input data. 18. The method of claim 17 , wherein determining metadata comprises determining whether the input data of an input field comprises handwritten information. 19. The method of claim 12 , further comprising classifying the one or more input fields. 20. A non-transitory memory storing instructions that, when executed by at least one processor, cause a system to perform operations comprising: receiving a plurality of documents, individual documents of the plurality of documents having a same document type; generating a distribution of values for a pixel at a corresponding location in the individual documents of the plurality of documents; determining, based on the distribution, one or more common features of the plurality of documents; determining, based on a comparison of a pixel at the corresponding location in an individual document to the distribution, one or more input fields; generating a template including the one or more common features and the one or more input fields; and inputting synthetic data into the one or more input fields of the template thereby generating at least one of a plurality of synthetic documents.

Assignees

Inventors

Classifications

  • Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title

  • Classification techniques · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • Templates · CPC title

  • Activation functions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10482174B1 cover?
The present disclosure relates to systems and methods for generating synthetic documents. In one implementation, a system for generating synthetic data from a plurality of documents may include at least one processor and at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the system to: receive a plurality of documents, individual docu…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).