Method and system for detecting and recognizing text in images
US-8977072-B1 · Mar 10, 2015 · US
US9530069B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9530069-B2 |
| Application number | US-201514613279-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 3, 2015 |
| Priority date | Jan 23, 2008 |
| Publication date | Dec 27, 2016 |
| Grant date | Dec 27, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: under the control of one or more computer systems configured with executable instructions, receiving an input image that includes at least one image variation; filtering and segmenting the input image; selecting regions within the filtered and segmented input image having connected components; creating a mask corresponding to the regions of connected components, the mask including bounding boxes that at least partially enclose corresponding regions of the connected components; intersecting the filtered and segmented input image with the mask to produce a first output image; separately processing the filtered and segmented input image corresponding to the mask to create a binary output image; separately recognizing text in the first output image and in the binaryoutput image using an optical character recognizer; and combining the separately recognized text from the first output image and from the binary output image to produce a single output. 2. The computer-implemented method of claim 1 , further comprising: separately processing the input image to produce a third output image using a different processing technique than used to produce the first output image and the second output image, the recognized text from the first output image, the second output image, and the third output image being combined using a majority vote process to select portions from the first output image, the second output image, and the third output image. 3. The computer-implemented method of claim 1 , wherein combining the separately recognized text from the first output image and the second output image comprises taking a logical OR of the first output image and the second output image. 4. The computer-implemented method of claim 1 , wherein the at least one image variation includes at least one of noise, blur, or a lighting variation. 5. The computer-implemented method of claim 1 , wherein selecting regions having the connected components includes identifying regions of connected pixels based on an intensity value of the pixels and a distance between the pixels. 6. The computer-implemented method of claim 5 , wherein separately recognizing the text in the first output image and in the binary output image is based upon whether pixel values for pixels are above or below a threshold value. 7. The computer-implemented method of claim 1 , wherein the bounding boxes are rectangular in shape. 8. A computing system, comprising: a processor; and a memory including instructions that, when executed by the processor, cause the computing system to: receive an input image that includes at least one image variation; filter and segmenting the input image; select regions within the filtered and segmented input image having connected components; create a mask corresponding to the regions of connected components, the mask including bounding boxes that at least partially enclose corresponding regions of the connected components; intersect the filtered and segmented input image with the mask to produce a first output image; separately process the filtered and segmented input image corresponding to the mask to create a binary output image; separately recognize text in the first output image and in the binary output image using an optical character recognizer; and combine the separately recognized text from the first output image and from the binary output image to produce a single output. 9. The computing system of claim 8 , wherein the instructions, when executed by the processor, further cause the computing system to: separately process the input image to produce a third output image using a different processing technique than is used to produce the first output image and the second output image, the recognized text from the first output image, the second output image, and the third output image being combined using a majority vote process to select portions from the first output image, the second output image, and the third output image. 10. The computing system of claim 8 , wherein the instructions, when executed by the processor, further cause the computing system to combine the separately recognized text from the first output image and the second output image by taking a logical OR of the first output image and the second output image. 11. The computing system of claim 8 , wherein the at least one image variation includes at least one of noise, blur, or a lighting variation. 12. The computing system of claim 8 , wherein the instructions, when executed by the processor, further cause the computing system to select regions having the connected components by identifying regions of connected pixels based on an intensity value of the pixels and a distance between the pixels. 13. The computing system of claim 8 , wherein the instructions, when executed by the processor, further cause the computing system to separately recognize the text in the first output image and in the binary output image based upon whether pixel values for pixels are above or below a threshold value. 14. The computing system of claim 8 , wherein the bounding boxes are rectangular in shape. 15. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to: receive an input image that includes at least one image variation; filter and segmenting the input image; select regions within the filtered and segmented input image having connected components; create a mask corresponding to the regions of connected components, the mask including bounding boxes that at least partially enclose corresponding regions of the connected components; intersect the filtered and segmented input image with the mask to produce a first output image; separately process the filtered and segmented input image corresponding to the mask to create a binary output image; separately recognize text in the first output image and in the binary output image using an optical character recognizer; and combine the separately recognized text from the first output image and from the binary output image to produce a single output. 16. The non-transitory computer-readable storage medium of claim 15 , wherein the instructions, when executed by the processor, further cause the processor to: separately process the input image to produce a third output image using a different processing technique than is used to produce the first output image and the second output image, the recognized text from the first output image, the second output image, and the third output image being combined using a majority vote process to select portions from the first output image, the second output image, and the third output image. 17. The non-transitory computer-readable storage medium of claim 15 , wherein the instructions, when executed by the processor, further cause the processor to combine the separately recognized text from the first output image and the second output image by taking a logical OR of the first output image and the second output image. 18. The non-transitory computer-readable storage medium of claim 15 , wherein the at least one image variation includes at least one of noise, blur, or a lighting variation. 19. The non-transitory computer-readable storage medium of claim 15 , wherein the instructions, when executed by the processor, further cause the processor to select regions having the connected components by identifying regions of connected pixels based on an intensity value of the pi
Noise filtering · CPC title
Quantising the image signal · CPC title
Removing patterns interfering with the pattern to be recognised, such as ruled lines or underlines · CPC title
Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques · CPC title
Text, e.g. of license plates, overlay texts or captions on TV images · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.