Computer, document identification method, and system

US10552674B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10552674-B2
Application numberUS-201815918830-A
CountryUS
Kind codeB2
Filing dateMar 12, 2018
Priority dateMay 31, 2017
Publication dateFeb 4, 2020
Grant dateFeb 4, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer, which is configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer stores template information dictionary information. The computer is configured to: execute character recognition processing on image data on the paper-based document; extract an attribute corresponding to each of the at least one type of attribute, which is defined in each of the plurality of templates, through use of a result of the character recognition processing and the plurality of templates; calculate a score regarding the extracted attribute for each of the plurality of templates; select one of the plurality of templates that has the highest extraction accuracy of the attribute based on the score; and generate output information through use of the selected template.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer, which is configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer comprising: a processor; and a storage device coupled to the processor, wherein: the storage apparatus is configured to store: template information for managing a plurality of templates in each of which at least one type of attribute is defined; and dictionary information for defining a character string to be extracted as the attribute; the template information includes a plurality of entries each formed of: identification information on each of the plurality of templates; identification information indicating each of the at least one type of attribute; and positional information indicating a position on a paper surface of an attribute corresponding to each of the at least one type of attribute; and the processor is configured to: execute character recognition processing on image data on the paper-based document; extract an attribute corresponding to each of the at least one type of attribute, which is defined in each of the plurality of templates, through use of a result of the character recognition processing and the plurality of templates; calculate a score regarding the extracted attribute for each of the plurality of templates through use of the dictionary information, the template information, and the extracted attribute; select one of the plurality of templates that has the highest extraction accuracy of the attribute based on the score; and generate output information including the attribute extracted through use of the selected template. 2. The computer according to claim 1 , wherein the processor is configured to: generate a feature vector indicating a feature of the paper-based document through use of the score of the selected template; calculate an evaluation value indicating reliability of the output information through use of the feature vector; and determine, based on a result of comparing the evaluation value and a threshold value, whether one of modification of the output information and generation of new output information is required. 3. The computer according to claim 2 , wherein the processor is configured to: refer to the dictionary information to calculate a first score indicating a degree to which the extracted attribute matches a character string registered in the dictionary information; refer to the template information to calculate a second score for evaluating a deviation between the positional information and a position of the extracted attribute on the paper surface; and generate the feature vector including the first score and the second score of the extracted attribute as components of the feature vector. 4. The computer according to claim 3 , wherein the processor is configured to: calculate a third score for evaluating a size of a range of the extracted attribute on the paper surface; calculate a fourth score for evaluating a distance between attributes having the same type of attribute; and generate the feature vector including the first score, the second score, the third score, and the fourth score of the extracted attribute as components of the feature vector. 5. A document identification method, which is executed by a computer configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer including: a processor; and a storage device coupled to the processor, the storage device being configured to store: template information for managing a plurality of templates in each of which at least one type of attribute is defined; and dictionary information for defining a character string to be extracted as the attribute, the template information including a plurality of entries each formed of: identification information on each of the plurality of templates; identification information indicating each of the at least one type of attribute; and positional information indicating a position on a paper surface of an attribute corresponding to each of the at least one type of attribute, the document identification method including: a first step of executing, by the processor, character recognition processing on image data on the paper-based document; a second step of extracting, by the processor, an attribute corresponding to each of the at least one type of attribute, which is defined in each of the plurality of templates, through use of a result of the character recognition processing and the plurality of templates; a third step of calculating, by the processor, a score regarding the extracted attribute for each of the plurality of templates through use of the dictionary information, the template information, and the extracted attribute; a fourth step of selecting, by the processor, one of the plurality of templates that has the highest extraction accuracy of the attribute based on the score; and a fifth step of generating, by the processor, output information including the attribute extracted through use of the selected template. 6. The document identification method according to claim 5 , further including: a sixth step of generating, by the processor, a feature vector indicating a feature of the paper-based document through use of the score of the selected template; a seventh step of calculating, by the processor, an evaluation value indicating reliability of the output information through use of the feature vector; and an eighth step of determining, by the processor, based on a result of comparing the evaluation value and a threshold value, whether one of modification of the output information and generation of new output information is required. 7. The document identification method according to claim 6 , wherein: the third step includes: referring, by the processor, to the dictionary information to calculate a first score indicating a degree to which the extracted attribute matches a character string registered in the dictionary information; and referring, by the processor, to the template information to calculate a second score for evaluating a deviation between the positional information and a position of the extracted attribute on the paper surface; and the sixth step includes generating, by the processor, the feature vector including the first score and the second score of the extracted attribute as components of the feature vector. 8. The document identification method according to claim 7 , wherein: the third step includes: calculating, by the processor, a third score for evaluating a size of a range of the extracted attribute on the paper surface; and calculating by the processor, a fourth score for evaluating a distance between attributes having the same type of attribute; and the sixth step includes generating, by the processor, the feature vector including the first score, the second score, the third score, and the fourth score of the extracted attribute as components of the feature vector. 9. A system, comprising: a computer; and a terminal, the computer including: a first processor; and a first storage device coupled to the first processor, the terminal including: a second processor; and a second storage device coupled to the second processor, wherein: the first storage device is configured to store: template information for managing a plurality of templates in each of which at least one type of attribute being a character string indicating a feature of a paper-based document is defined; and dictionary information for defining a character string to be extracted as the attribute; the template information includes a plurality of entries each formed of: identification information on

Assignees

Inventors

Classifications

  • Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching (specially adapted for image segmentation G06T7/10; specially adapted for the analysis of motion G06T7/20; specially adapted for image alignment G06T7/30; specially adapted for the calculation of depth from stereo images G06T7/50; specially adapted for position determination G06T7/70) · CPC title

  • Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title

  • Billing or invoicing · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10552674B2 cover?
A computer, which is configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer stores template information dictionary information. The computer is configured to: execute character recognition processing on image data on the paper-based document; extract an attribute corresponding to each of the at least one type of attribute, which …
Who is the assignee on this patent?
Hitachi Ltd
What technology area does this patent fall under?
Primary CPC classification G06V30/19013. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).