Object detection and image classification based optical character recognition
US-11100319-B2 · Aug 24, 2021 · US
US11727697B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11727697-B2 |
| Application number | US-202117364343-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2021 |
| Priority date | Jan 27, 2020 |
| Publication date | Aug 15, 2023 |
| Grant date | Aug 15, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system performs optical character recognition (OCR) on an image displaying a portion of an object. An image classification system identifies the object in the image, based on which one or more object detection models identify labels associated with the object within the image. The system determines text of the identified labels using OCR, and analyzes the OCR resultant text for discrepancies and/or inaccuracies. In response to identifying a discrepancy, the system provides a recommendation for improving the accuracy of the OCR resultant text.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method for optical character recognition, comprising: receiving an image displaying one or more labels; detecting the one or more labels by providing the image as input to one or more object detection models, wherein an object detection model is configured to identify a label of a label type in an input image; for a detected label in the image: determining a text of the detected label using optical character recognition; accessing expected phrase types for phrases of a label type of the detected label; analyzing the text based on an expected phrase type to identify discrepancies in the text; responsive to identifying a discrepancy in a text of a particular label, sending a recommendation for improving the image. 2. The computer implemented method of claim 1 , further comprising: identifying a plurality of zones of the received image; for a zone from the plurality of zones, predicting a value of the text; and responsive to identifying differences in predicted values of the text, using a predicted value of the text having the highest number of occurrences as the value of the text. 3. The computer implemented method of claim 2 , the method further comprising: responsive to determining that two predicted values of the text have the highest number of occurrences, combining at least two of the plurality of zones to obtain a tiebreaker zone; and determining the value of the text to be a predicted value of the text for the tiebreaker zone. 4. The computer implemented method of claim 3 , the method further comprising: determining a size of a zone from the plurality of zones; creating multiple copies of the image; cropping each copy of the image to obtain cropped images such that each cropped image corresponds to a zone; and predicting text values for the zone corresponding to each cropped image. 5. The computer implemented method of claim 3 , the method further comprising: responsive to determining that the value of the text is not the predicted value of the text for the tiebreaker zone, generating the recommendation for improving an accuracy of the text of the label determined using optical character recognition. 6. The computer implemented method of claim 5 , wherein the recommendation comprises capturing a new image. 7. The computer implemented method of claim 1 , wherein identifying a discrepancy in a text comprises determining that an expected data type of a value in the expected phrase type fails to match an actual data type of the value in the text. 8. The computer implemented method of claim 1 , wherein identifying a discrepancy in a text comprises comparing a value in the text against a predetermined list of expected values for the text. 9. A computer readable non-transitory storage medium storing instructions that when executed by a computer processor, cause the computer processor to perform steps comprising: receiving an image displaying one or more labels; detecting the one or more labels by providing the image as input to one or more object detection models, wherein an object detection model is configured to identify a label of a label type in an input image; for a detected label in the image: determining a text of the detected label using optical character recognition; accessing expected phrase types for phrases of a label type of the detected label; analyzing the text based on an expected phrase type to identify discrepancies in the text; responsive to identifying a discrepancy in a text of a particular label, sending a recommendation for improving the image. 10. The computer readable non-transitory storage medium of claim 9 , wherein the instructions cause the computer processor to perform steps comprising: identifying a plurality of zones of the image; for a zone from the plurality of zones, predicting a value of the text; and responsive to identifying differences in predicted values of the text, using a predicted value of the text having the highest number of occurrences as the value of the text. 11. The computer readable non-transitory storage medium of claim 10 , wherein the instructions cause the computer processor to perform steps comprising: responsive to determining that two predicted values of the text have the highest number of occurrences, combining at least two of the plurality of zones to obtain a tiebreaker zone; and determining the value of the text to be a predicted value of the text for the tiebreaker zone. 12. The computer readable non-transitory storage medium of claim 11 , wherein the instructions cause the computer processor to perform steps comprising: determining a size of a zone from the plurality of zones; creating multiple copies of the image; cropping each copy of the image to obtain cropped images such that each cropped image corresponds to a zone; and predicting text values for the zone corresponding to each cropped image. 13. The computer readable non-transitory storage medium of claim 11 , wherein the instructions cause the computer processor to perform steps comprising: responsive to determining that the value of the text is not the predicted value of the text for the tiebreaker zone, generating the recommendation for improving an accuracy of the text of the label determined using optical character recognition. 14. The computer readable non-transitory storage medium of claim 13 , wherein the recommendation comprises capturing a new image. 15. The computer readable non-transitory storage medium of claim 9 , wherein identifying a discrepancy in a text comprises determining that an expected data type of a value in the expected phrase type fails to match an actual data type of the value in the text. 16. The computer readable non-transitory storage medium of claim 9 , wherein identifying a discrepancy in a text comprises comparing a value in the text against a predetermined list of expected values for the text. 17. A computer system comprising: a computer processor; and a computer readable non-transitory storage medium storing instructions that when executed by the computer processor, cause the computer processor to perform steps comprising: receiving an image displaying one or more labels; detecting the one or more labels by providing the image as input to one or more object detection models, wherein an object detection model is configured to identify a label of a label type in an input image; for a detected label in the image: determining a text of the detected label using optical character recognition; accessing expected phrase types for phrases of a label type of the detected label; analyzing the text based on an expected phrase type to identify discrepancies in the text; responsive to identifying a discrepancy in a text of a particular label, sending a recommendation for improving the image. 18. The computer system of claim 17 , wherein the instructions cause the computer processor to perform steps comprising: identifying a plurality of zones of the image; for a zone from the plurality of zones, predicting a value of the text; and responsive to identifying differences in predicted values of the text, using a predicted value of the text having the highest number of occurrences as the value of the text. 19. The computer system of claim 18 , wherein the instructions cause the computer processor to perform steps comprising: responsive to determining that two predicted values of the text have the highest number of occurrences, combining at least two of the plurality of zones to ob
Scene text, e.g. street names · CPC title
Classification techniques · CPC title
Classification of content, e.g. text, photographs or tables · CPC title
Character recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.