Systems and methods for identification document processing and business workflow integration
US-9058515-B1 · Jun 16, 2015 · US
US10242285B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10242285-B2 |
| Application number | US-201615214351-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 19, 2016 |
| Priority date | Jul 20, 2015 |
| Publication date | Mar 26, 2019 |
| Grant date | Mar 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for improved binarization and extraction of information from digital image data are disclosed in accordance with various embodiments. The inventive concepts include independently binarizing portions of the image data on the basis of individual features, e.g. per connected component, and using multiple different binarization thresholds to obtain the best possible binarization result for each portion of the image data independently binarized. Determining the quality of each binarization result may be based on attempted recognition and/or extraction of information therefrom. Independently binarized portions may be assembled into a contiguous result. In one embodiment, a method includes: identifying a region of interest within a digital image; generating a plurality of binarized images based on the region of interest using different binarization thresholds; and extracting data from some or all of the plurality of binarized images. Corresponding systems and computer program products are also disclosed.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: identifying a region of interest within a digital image; generating a plurality of binarized images based on the region of interest, wherein some or all of the binarized images are generated using a different one of a plurality of binarization thresholds; and extracting data from some or all of the plurality of binarized images; wherein extracting the data from some or all of the plurality of binarized images comprises: generating at least one sequence of candidate extraction results for each grouping of one or more connected components depicted within the region of interest; determining an optimal extraction result within each sequence of candidate extraction results; assembling all of the optimal extraction results into a single string of the one or more connected components; and wherein determining the optimal extraction result within each sequence of candidate extraction results comprises selecting one extraction result within each sequence of candidate extraction results so as to minimize intensity differences between the optimal extraction results assembled into the single string; and wherein at least some of the connected components are text characters. 2. The computer-implemented method as recited in claim 1 , wherein the region of interest comprises a plurality of connected components; and wherein each of the plurality of binarized images corresponds to a different combination of: one of the plurality of connected components; and one of the plurality of binarization thresholds. 3. The computer-implemented method as recited in claim 1 , wherein the region of interest comprises a plurality of connected components; and wherein extracting the data is performed on a per-component basis for at least some of the plurality of connected components. 4. The computer-implemented method as recited in claim 1 , wherein the region of interest comprises a plurality of connected components; wherein extracting the data comprises estimating an identity of some or all of the plurality of connected components within one or more of the plurality of binarized images; wherein the identity of some or all of the plurality of connected components within one or more of the plurality of binarized images comprises the character, location, size, shape or color; and the method further comprising determining a confidence of the estimated identity of some or all of the plurality of connected components. 5. The computer-implemented method as recited in claim 4 , wherein determining the confidence of the estimated identity of some or all of the plurality of connected components comprises comparing the estimated identity of each respective one of the plurality of connected components for which the identity was estimated with an expected identity of the respective one of the plurality of connected components. 6. The computer-implemented method as recited in claim 4 , wherein determining the confidence of the estimated identity of some or all of the plurality of connected components comprises comparing an estimated location of each respective one of the plurality of connected components for which the identity was estimated with an expected location of the respective one of the plurality of connected components. 7. The computer-implemented method as recited in claim 4 , wherein some or all of the plurality of connected components comprise non-textual information; and wherein determining the confidence of the estimated identity of some or all of the plurality of connected components comprises classifying some or all of the connected components for which the identity was estimated based on image features. 8. The computer-implemented method as recited in claim 4 , comprising determining whether the confidence of the estimated identity of one of the plurality of connected components is less than a predetermined confidence threshold. 9. The computer-implemented method as recited in claim 8 , comprising, in response to determining the confidence of the estimated identity of the one of the plurality of connected components is less than the predetermined confidence threshold, estimating the identity of the one of the plurality of connected components based on a different one of the plurality of binarized images than the one of the plurality of binarized images for which the confidence of the estimated identity of one of the plurality of connected components was determined to be less than the predetermined confidence threshold. 10. The computer-implemented method as recited in claim 1 , wherein each sequence of candidate extraction results comprises a plurality of candidate extraction results each corresponding to the same grouping of one or more of the connected components depicted within the region of interest; and wherein each of the plurality of candidate extraction results in each respective sequence of candidate extraction results corresponds to a different one of the plurality of binarization thresholds. 11. The computer-implemented method as recited in claim 10 , wherein at least one of the plurality of candidate results from each of at least two of the sequences of candidate extraction results correspond to a same one of the plurality of binarization thresholds. 12. The computer-implemented method as recited in claim 1 , wherein the region of interest comprises a plurality of connected components; and wherein at least two of the plurality of connected components are extracted from different ones of the plurality of binarized images. 13. The computer-implemented method as recited in claim 1 , comprising normalizing color within the digital image or the region of interest prior to thresholding; wherein normalizing color includes normalizing intensity values across one or more color channels to stretch the channel along a single normalized scale; and the one or more color channels being selected from a group consisting of: R, G and B. 14. The computer-implemented method as recited in claim 1 , comprising: validating the extracted data; and inferring a classification of an object depicted in the digital image based on validating the extracted data. 15. A system, comprising: a processor; and logic integrated with and/or executable by the processor to cause the processor to: identify a region of interest within a digital image, wherein the region of interest comprises a plurality of connected components; generate a plurality of binarized images based on the region of interest, wherein some or all of the binarized images are generated using a different one of a plurality of binarization thresholds; and extract data from some or all of the plurality of binarized images, wherein the data comprise a potential character identity of one or more of the plurality of connected components; wherein the region of interest is characterized by a complex background overlapped by the plurality of connected components; wherein one or more of the connected components overlap or are obscured by one or more unique background elements such that no single binarization threshold applied to a region encompassing the one or more of the plurality of connected components can identify the one or more of the connected components that overlap or are obscured by the one or more unique background elements. 16. A computer program product, comprising a non-transitory computer readable medium having embodied therewith computer readable program instructions configured to cause a processor, upon execution thereof, to: identify, using the processor, a region of interest
Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns · CPC title
Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title
by analysing connectivity, e.g. edge linking, connected component analysis or slices · CPC title
Region-based segmentation · CPC title
involving region growing; involving region merging; involving connected component labelling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.