Template matching with data correction

US9530068B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9530068-B2
Application numberUS-201414537113-A
CountryUS
Kind codeB2
Filing dateNov 10, 2014
Priority dateNov 10, 2014
Publication dateDec 27, 2016
Grant dateDec 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach is provided to generate forms with template inclusions. In the approach, optical character recognition (OCR) text is compared to corresponding text in a selected form. Characters of text in the OCR text are then replaced with text from the template text, the replacing results in a form with template inclusions. The form with template inclusions is then processed by a forms processing operation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method implemented by an information handling system that includes a memory and a processor, the method comprising: receiving a selected form comprising printed text and handwritten text; identifying a selected form template from a plurality of form templates, wherein each of the plurality of form templates comprises a plurality of lines of text, and wherein the identifying further comprises: performing a fuzzy comparison of a first optical character recognition (OCR) text corresponding to the selected form printed text, and excluding the handwritten text, with text from the plurality of form templates, wherein each of the comparisons results in a form comparison score; and selecting the selected form template from the plurality of form templates based on the corresponding form comparison score; in response to identifying the selected form template, comparing the first OCR text corresponding to the selected form with template text corresponding to the selected form template; replacing a plurality of characters in the first OCR text with text from the template text, the replacing resulting in a form with template inclusions; and processing the form with template inclusions in a forms processing operation. 2. The method of claim 1 wherein the comparing further comprises: comparing each word of the template text with a corresponding word from the first OCR text using a fuzzy matching algorithm, wherein the replacing of the plurality of characters is performed in response to the comparing. 3. The method of claim 2 further comprising: selecting each word of text from a second OCR text corresponding to the selected form printed text and including the handwritten text; identifying a corresponding word in the template text that corresponds to each of the selected words from the second OCR text; comparing the selected word from the second OCR text with the corresponding word in the template text using the fuzzy matching algorithm, wherein the comparing results in a word comparison score; writing the selected word from the second OCR text to the form with template inclusions in response to the word comparison score indicating that the selected word is absent from the template text; and writing the corresponding word from the template text to the form with template inclusions in response to the word comparison score indicating that the selected word is present in the template text. 4. The method of claim 1 further comprising: identifying a version of the selected form template from a plurality of versions of the selected form template, wherein the identifying further comprises: performing a fuzzy comparison of the first OCR text with the text from the plurality of versions of the selected form template, wherein each of the comparisons results in a form version score; and selecting the selected form template from the plurality of versions of the selected form template based on the corresponding form version score. 5. The method of claim 1 further comprising: ingesting the form with template inclusions into a Question Answering (QA) System corpus. 6. The method of claim 1 further comprising: wherein the selected document is a facsimile of the selected form template that has been filled in by hand; and performing an OCR process on the selected document, wherein the OCR process results in the first OCR text. 7. An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of: receiving a selected form comprising printed text and handwritten text; identifying a selected form template from a plurality of form templates, wherein each of the plurality of form templates comprises a plurality of lines of text, and wherein the actions that perform the identifying further comprise: performing a fuzzy comparison of a first optical character recognition (OCR) text corresponding to the selected form printed text, and excluding the handwritten text, with text from the plurality of form templates, wherein each of the comparisons results in a form comparison score; and selecting the selected form template from the plurality of form templates based on the corresponding form comparison score; comparing the first OCR text corresponding to the selected form with template text corresponding to the selected form template; replacing a plurality of characters in the first OCR text with text from the template text, the replacing resulting in a form with template inclusions; and processing the form with template inclusions in a forms processing operation. 8. The information handling system of claim 7 wherein the actions that perform the comparing further comprise: comparing each word of the template text with a corresponding word from the first OCR text using a fuzzy matching algorithm, wherein the replacing of the plurality of characters is performed in response to the comparing. 9. The information handling system of claim 8 wherein the actions further comprise: selecting each word of text from a second OCR text corresponding to the selected form printed text and including the handwritten text; identifying a corresponding word in the template text that corresponds to each of the selected words from the second OCR text; comparing the selected word from the second OCR text with the corresponding word in the template text using the fuzzy matching algorithm, wherein the comparing results in a word comparison score; writing the selected word from the second OCR text to the form with template inclusions in response to the word comparison score indicating that the selected word is absent from the template text; and writing the corresponding word from the template text to the form with template inclusions in response to the word comparison score indicating that the selected word is present in the template text. 10. The information handling system of claim 7 wherein the actions further comprise: identifying a version of the selected form template from a plurality of versions of the selected form template, wherein the identifying further comprises: performing a fuzzy comparison of the first OCR text with the text from the plurality of versions of the selected form template, wherein each of the comparisons results in a form version score; and selecting the selected form template from the plurality of versions of the selected form template based on the corresponding form version score. 11. The information handling system of claim 7 wherein the actions further comprise: ingesting the form with template inclusions into a Question Answering (QA) System corpus. 12. The information handling system of claim 7 wherein the actions further comprise: wherein the selected document is a facsimile of the selected form template that has been filled in by hand; and performing an OCR process on the selected document, wherein the OCR process results in the first OCR text. 13. A computer program product stored in a computer readable storage medium, comprising computer program code that, when executed by an information handling system, causes the information handling system to perform actions comprising: receiving a selected form comprising printed text and handwritten text; identifying a selected form template from a plurality of form templates, wherein each of the plurality of form templates comprises a plurality of lines of text, and wherein the identifying further comprises: performing a fuzzy comparison of a first optic

Assignees

Inventors

Classifications

  • G06V30/412Primary

    Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title

  • of printed characters having additional code marks or containing code marks · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9530068B2 cover?
An approach is provided to generate forms with template inclusions. In the approach, optical character recognition (OCR) text is compared to corresponding text in a selected form. Characters of text in the OCR text are then replaced with text from the template text, the replacing results in a form with template inclusions. The form with template inclusions is then processed by a forms processin…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06V30/412. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).