Computer and document identification method
US-2019138804-A1 · May 9, 2019 · US
US10552674B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10552674-B2 |
| Application number | US-201815918830-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 12, 2018 |
| Priority date | May 31, 2017 |
| Publication date | Feb 4, 2020 |
| Grant date | Feb 4, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer, which is configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer stores template information dictionary information. The computer is configured to: execute character recognition processing on image data on the paper-based document; extract an attribute corresponding to each of the at least one type of attribute, which is defined in each of the plurality of templates, through use of a result of the character recognition processing and the plurality of templates; calculate a score regarding the extracted attribute for each of the plurality of templates; select one of the plurality of templates that has the highest extraction accuracy of the attribute based on the score; and generate output information through use of the selected template.
Opening claim text (preview).
What is claimed is: 1. A computer, which is configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer comprising: a processor; and a storage device coupled to the processor, wherein: the storage apparatus is configured to store: template information for managing a plurality of templates in each of which at least one type of attribute is defined; and dictionary information for defining a character string to be extracted as the attribute; the template information includes a plurality of entries each formed of: identification information on each of the plurality of templates; identification information indicating each of the at least one type of attribute; and positional information indicating a position on a paper surface of an attribute corresponding to each of the at least one type of attribute; and the processor is configured to: execute character recognition processing on image data on the paper-based document; extract an attribute corresponding to each of the at least one type of attribute, which is defined in each of the plurality of templates, through use of a result of the character recognition processing and the plurality of templates; calculate a score regarding the extracted attribute for each of the plurality of templates through use of the dictionary information, the template information, and the extracted attribute; select one of the plurality of templates that has the highest extraction accuracy of the attribute based on the score; and generate output information including the attribute extracted through use of the selected template. 2. The computer according to claim 1 , wherein the processor is configured to: generate a feature vector indicating a feature of the paper-based document through use of the score of the selected template; calculate an evaluation value indicating reliability of the output information through use of the feature vector; and determine, based on a result of comparing the evaluation value and a threshold value, whether one of modification of the output information and generation of new output information is required. 3. The computer according to claim 2 , wherein the processor is configured to: refer to the dictionary information to calculate a first score indicating a degree to which the extracted attribute matches a character string registered in the dictionary information; refer to the template information to calculate a second score for evaluating a deviation between the positional information and a position of the extracted attribute on the paper surface; and generate the feature vector including the first score and the second score of the extracted attribute as components of the feature vector. 4. The computer according to claim 3 , wherein the processor is configured to: calculate a third score for evaluating a size of a range of the extracted attribute on the paper surface; calculate a fourth score for evaluating a distance between attributes having the same type of attribute; and generate the feature vector including the first score, the second score, the third score, and the fourth score of the extracted attribute as components of the feature vector. 5. A document identification method, which is executed by a computer configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer including: a processor; and a storage device coupled to the processor, the storage device being configured to store: template information for managing a plurality of templates in each of which at least one type of attribute is defined; and dictionary information for defining a character string to be extracted as the attribute, the template information including a plurality of entries each formed of: identification information on each of the plurality of templates; identification information indicating each of the at least one type of attribute; and positional information indicating a position on a paper surface of an attribute corresponding to each of the at least one type of attribute, the document identification method including: a first step of executing, by the processor, character recognition processing on image data on the paper-based document; a second step of extracting, by the processor, an attribute corresponding to each of the at least one type of attribute, which is defined in each of the plurality of templates, through use of a result of the character recognition processing and the plurality of templates; a third step of calculating, by the processor, a score regarding the extracted attribute for each of the plurality of templates through use of the dictionary information, the template information, and the extracted attribute; a fourth step of selecting, by the processor, one of the plurality of templates that has the highest extraction accuracy of the attribute based on the score; and a fifth step of generating, by the processor, output information including the attribute extracted through use of the selected template. 6. The document identification method according to claim 5 , further including: a sixth step of generating, by the processor, a feature vector indicating a feature of the paper-based document through use of the score of the selected template; a seventh step of calculating, by the processor, an evaluation value indicating reliability of the output information through use of the feature vector; and an eighth step of determining, by the processor, based on a result of comparing the evaluation value and a threshold value, whether one of modification of the output information and generation of new output information is required. 7. The document identification method according to claim 6 , wherein: the third step includes: referring, by the processor, to the dictionary information to calculate a first score indicating a degree to which the extracted attribute matches a character string registered in the dictionary information; and referring, by the processor, to the template information to calculate a second score for evaluating a deviation between the positional information and a position of the extracted attribute on the paper surface; and the sixth step includes generating, by the processor, the feature vector including the first score and the second score of the extracted attribute as components of the feature vector. 8. The document identification method according to claim 7 , wherein: the third step includes: calculating, by the processor, a third score for evaluating a size of a range of the extracted attribute on the paper surface; and calculating by the processor, a fourth score for evaluating a distance between attributes having the same type of attribute; and the sixth step includes generating, by the processor, the feature vector including the first score, the second score, the third score, and the fourth score of the extracted attribute as components of the feature vector. 9. A system, comprising: a computer; and a terminal, the computer including: a first processor; and a first storage device coupled to the first processor, the terminal including: a second processor; and a second storage device coupled to the second processor, wherein: the first storage device is configured to store: template information for managing a plurality of templates in each of which at least one type of attribute being a character string indicating a feature of a paper-based document is defined; and dictionary information for defining a character string to be extracted as the attribute; the template information includes a plurality of entries each formed of: identification information on
Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching (specially adapted for image segmentation G06T7/10; specially adapted for the analysis of motion G06T7/20; specially adapted for image alignment G06T7/30; specially adapted for the calculation of depth from stereo images G06T7/50; specially adapted for position determination G06T7/70) · CPC title
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
Billing or invoicing · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.