Object detection and image classification based optical character recognition

US11100319B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11100319-B2
Application numberUS-202016773813-A
CountryUS
Kind codeB2
Filing dateJan 27, 2020
Priority dateJan 27, 2020
Publication dateAug 24, 2021
Grant dateAug 24, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system performs optical character recognition (OCR) on an image displaying a portion of an object. An image classification system identifies the object in the image, based on which one or more object detection models identify labels associated with the object within the image. The system determines text of the identified labels using OCR, and analyzes the OCR resultant text for discrepancies and/or inaccuracies. In response to identifying a discrepancy, the system provides a recommendation for improving the accuracy of the OCR resultant text.

First claim

Opening claim text (preview).

We claim: 1. A computer implemented method for performing optical character recognition comprising: receiving an image displaying a portion of an object, the image displaying one or more labels of the object; accessing an image classification model configured to receive an input image displaying a portion of an object and determine an object type of the object, the object type associated with one or more label types; determining an object type of the object of the received image using the image classification model; accessing one or more object detection models for the determined object type, each object detection model configured to receive an image as input and identify a label of a label type associated with the object type; for each accessed object detection model, providing the received image as input to the object detection model to detect a label of a label type in the received image; for each detected label of a label type in the received image: determining a text of the detected label using optical character recognition; accessing expected phrase types for phrases of the label type; and analyzing the text based on the expected phrase type to identify discrepancies in the text; and responsive to identifying a discrepancy in a text of a label, sending a recommendation for improving an accuracy of the text of the label determined using optical character recognition. 2. The computer implemented method of claim 1 , further comprising: identifying a plurality of zones of the received image, wherein at least two of the zones have an overlapping area comprising a text; for each of the plurality of zones, predicting a value of the text; and responsive to identifying differences in the predicted values of the text, determining that the value of the text is the predicted value of the text having the highest number of occurrences. 3. The computer implemented method of claim 2 , the method further comprising: responsive to determining that two predicted values of the text have the highest number of occurrences, combining at least two of the plurality of zones to obtain a tiebreaker zone; predicting a value of the text for the tiebreaker zone; and determining that the value of the text is the predicted value of the text for the tiebreaker zone. 4. The computer implemented method of claim 3 , the method further comprising: determining a size of each zone of the plurality of zones; and creating multiple copies of the image; cropping each copy of the image to determine a zone such that each phrase being recognized is included in a set of zones having the determined size. 5. The computer implemented method of claim 3 , the method further comprising: responsive to determining that the value of the text is not the predicted value of the text for the tiebreaker zone, sending the recommendation for improving the accuracy of the text of the label determined using optical character recognition. 6. The computer implemented method of claim 5 , wherein the recommendation for improving the accuracy of the text of the label includes capturing a new image displaying the portion of the object. 7. The computer implemented method of claim 1 , wherein identifying a discrepancy in a text comprises determining that an expected data type of a value in the expected phrase type fails to match an actual data type of the value in the text. 8. The computer implemented method of claim 1 , wherein identifying a discrepancy in a text comprises comparing a value in the text against a predetermined list of expected values for the text. 9. A computer readable non-transitory storage medium storing instructions for: receiving an image displaying a portion of an object, the image displaying one or more labels of the object; accessing an image classification model configured to receive an input image displaying a portion of an object and determine an object type of the object, the object type associated with one or more label types; determining an object type of the object of the received image using the image classification model; accessing one or more object detection models for the determined object type, each object detection model configured to receive an image as input and identify a label of a label type associated with the object type; for each accessed object detection model, providing the received image as input to the object detection model to detect a label of a label type in the received image; for each detected label of a label type in the received image: determining a text of the detected label using optical character recognition; accessing expected phrase types for phrases of the label type; and analyzing the text based on the expected phrase type to identify discrepancies in the text; and responsive to identifying a discrepancy in a text of a label, sending a recommendation for improving an accuracy of the text of the label determined using optical character recognition. 10. The computer readable non-transitory storage medium of claim 9 , further storing instructions for: identifying a plurality of zones of the received image, wherein at least two of the zones have an overlapping area comprising a text; for each of the plurality of zones, predicting a value of the text; and responsive to identifying differences in the predicted values of the text, determining that the value of the text is the predicted value of the text having the highest number of occurrences. 11. The computer readable non-transitory storage medium of claim 10 , further storing instructions for: responsive to determining that two predicted values of the text have the highest number of occurrences, combining at least two of the plurality of zones to obtain a tiebreaker zone; predicting a value of the text for the tiebreaker zone; and determining that the value of the text is the predicted value of the text for the tiebreaker zone. 12. The computer readable non-transitory storage medium of claim 11 , further storing instructions for: determining a size of each zone of the plurality of zones; creating multiple copies of the image; and cropping each copy of the image to determine a zone such that each phrase being recognized is included in a set of zones having the determined size. 13. The computer readable non-transitory storage medium of claim 11 , further storing instructions for: responsive to determining that the value of the text is not the predicted value of the text for the tiebreaker zone, sending the recommendation for improving the accuracy of the text of the label determined using optical character recognition. 14. The computer readable non-transitory storage medium of claim 13 , wherein the recommendation for improving the accuracy of the text of the label includes capturing a new image displaying the portion of the object. 15. The computer readable non-transitory storage medium of claim 9 , wherein identifying a discrepancy in a text comprises determining that an expected data type of a value in the expected phrase type fails to match an actual data type of the value in the text. 16. The computer readable non-transitory storage medium of claim 9 , wherein identifying a discrepancy in a text comprises comparing a value in the text against a predetermined list of expected values for the text. 17. A computer-implemented system comprising: a computer processor; and a computer readable non-transitory storage medium storing instructions thereon, the instructions when executed by a processor cause the processor to perform the steps of: receiving an image displaying a

Assignees

Inventors

Classifications

  • G06V20/63Primary

    Scene text, e.g. street names · CPC title

  • Classification techniques · CPC title

  • Classification of content, e.g. text, photographs or tables · CPC title

  • Character recognition · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11100319B2 cover?
A system performs optical character recognition (OCR) on an image displaying a portion of an object. An image classification system identifies the object in the image, based on which one or more object detection models identify labels associated with the object within the image. The system determines text of the identified labels using OCR, and analyzes the OCR resultant text for discrepancies …
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/63. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 24 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).