Detecting orientation of textual documents on a live camera feed

US10163007B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10163007-B2
Application numberUS-201715499666-A
CountryUS
Kind codeB2
Filing dateApr 27, 2017
Priority dateApr 27, 2017
Publication dateDec 25, 2018
Grant dateDec 25, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to the extraction of text from an image including a depiction of a document. According to one embodiment, a mobile device receives an image depicting a document. The mobile device identifies a plurality of text areas in the document and identifies a midpoint of each of the plurality of text areas in the document. The mobile device detects one or more lines of text in the document including a plurality of text areas, where the plurality of text areas included in a line of text are associated with a midpoint having a coordinate within a threshold number of pixels on one axis in a two-dimensional space. Based on an orientation of the detected one or more lines of text, the mobile device determines a probable orientation of the document and extracts text from the image based on the determined probable orientation of the document.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing optical character recognition on an image of a document, comprising: receiving the image of the document; identifying a plurality of text areas in the document, wherein each of the plurality of text areas corresponds to a continuous set of non-whitespace characters detected in the document; identifying a midpoint of each of the plurality of text areas in the document wherein the identified midpoint of each of the plurality of text areas has a pixel coordinate location in a two-dimensional space; detecting one or more lines of text in the document, wherein each of the one or more lines of text comprises a set of text areas having midpoints with a vertical pixel coordinate located within a threshold number of pixels on a vertical axis in the two-dimensional space or a horizontal pixel coordinate located within a threshold number of pixels on a horizontal axis in two-dimensional space; determining a probable orientation of the document based on an orientation of the detected one or more lines of text, wherein the probable orientation indicates that lines of text in the document are oriented parallel with a vertical axis or a horizontal axis; and extracting text from the image based on the determined probable orientation of the document. 2. The method of claim 1 , wherein the probable orientation of the document comprises a first rotation angle and a second rotation angle, and wherein the second rotation angle differs from the first rotation angle by 180 degrees. 3. The method of claim 2 , wherein extracting text from the image comprises: generating a first image by rotating the received image by the first rotation angle; extracting text from the first image; and upon determining that the extracted text from the first image comprises valid text, returning the extracted text to an application for processing. 4. The method of claim 3 , further comprising: upon identifying a failure to extract valid text from the first image: generating a second image by rotating the received image by the second rotation angle; and extracting text from the second image. 5. The method of claim 4 , wherein identifying a failure to extract valid text from the first image comprises determining that a number of strings extracted from the first image do not match strings in a dictionary of known words. 6. The method of claim 1 , wherein determining the probable orientation of the document comprises determining that the image is mirrored about a vertical axis if the detected one or more lines of text include a plurality of text areas with a similar midpoint on a vertical axis of the image. 7. The method of claim 1 , wherein determining the probable orientation of the document comprises determining that the image is mirrored about a horizontal axis if the detected one or more lines of text include a plurality of text areas with a similar midpoint on a horizontal axis of the image. 8. A non-transitory computer-readable medium comprising instructions which, when executed on a processor, performs an operation for performing optical character recognition on an image of a document, the operation comprising: receiving the image of the document; identifying a plurality of text areas in the document wherein each text area corresponds to a continuous set of non-whitespace characters detected in the document; identifying a midpoint of each of the plurality of text areas in the document, wherein the identified midpoint of each of the plurality of text areas has a pixel coordinate location in a two-dimensional space; detecting one or more lines of text in the document, wherein each of the one or more lines of text comprises a set of text areas, wherein the set of text areas have midpoints with a vertical pixel coordinate located within a threshold number of pixels on a vertical axis in the two-dimensional space or a horizontal pixel coordinate located within a threshold number of pixels on a horizontal axis in the two-dimensional space, and wherein each of the plurality of text areas is associated with a midpoint having a coordinate within a threshold number of pixels on one axis in a two-dimensional space; determining a probable orientation of the document based on an orientation of the detected one or more lines of text, wherein the probable orientation indicates that lines of text in the document are oriented parallel with a vertical axis or a horizontal axis; and extracting text from the image based on the determined probable orientation of the document. 9. The non-transitory computer-readable medium of claim 8 , wherein the probable orientation of the document comprises a first rotation angle and a second rotation angle, and wherein the second rotation angle differs from the first rotation angle by 180 degrees. 10. The non-transitory computer-readable medium of claim 9 , wherein extracting text from the image comprises: generating a first image by rotating the received image by the first rotation angle; extracting text from the first image; and upon determining that the extracted text from the first image comprises valid text, returning the extracted text to an application for processing. 11. The non-transitory computer-readable medium of claim 10 , wherein the operation further comprises: upon identifying a failure to extract valid text from the first image: generating a second image by rotating the received image by the second rotation angle; and extracting text from the second image. 12. The non-transitory computer-readable medium of claim 11 , wherein identifying a failure to extract valid text from the first image comprises determining that a number of strings extracted from the first image do not match strings in a dictionary of known words. 13. The non-transitory computer-readable medium of claim 8 , wherein determining the probable orientation of the document comprises one of: determining that the image is mirrored about a vertical axis if the detected one or more lines of text include a plurality of text areas with a similar midpoint on a vertical axis of the image; or determining that the image is mirrored about a horizontal axis if the detected one or more lines of text include a plurality of text areas with a similar midpoint on a horizontal axis of the image. 14. A system, comprising: a processor; and a memory comprising instructions which, when executed on the processor, performs an operation for performing an operation for performing optical character recognition on an image of document, the operation comprising: receiving the image of the document; identifying a plurality of text areas in the document wherein each text area corresponds to a continuous set of non-whitespace characters detected in the document; identifying a midpoint of each of the plurality of text areas in the document, wherein the identified midpoint of each of the plurality of text areas has a pixel coordinate location in a two-dimensional space; detecting one or more lines of text in the document, wherein each of the one or more lines of text comprises a set of text areas, wherein the set of text areas have midpoints with a vertical pixel coordinate located within a threshold number of pixels on a vertical axis in the two-dimensional space or a horizontal pixel coordinate located within a threshold number of pixels on a horizontal axis in two-dimensional space; determining a probable orientation of the document based on an orientation of the detected one or more lines of text, wherein the probable orientation indicates that lines of text in the document are oriented parallel with a vertical axis or a ho

Assignees

Inventors

Classifications

  • Orientation detection or correction, e.g. rotation of multiples of 90 degrees · CPC title

  • Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

  • Lexical context · CPC title

  • Character recognition · CPC title

  • Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10163007B2 cover?
The present disclosure relates to the extraction of text from an image including a depiction of a document. According to one embodiment, a mobile device receives an image depicting a document. The mobile device identifies a plurality of text areas in the document and identifies a midpoint of each of the plurality of text areas in the document. The mobile device detects one or more lines of text…
Who is the assignee on this patent?
Intuit Inc
What technology area does this patent fall under?
Primary CPC classification G06V30/1463. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 25 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).