Enhancing documents portrayed in digital images

US10628519B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10628519-B2
Application numberUS-201715658289-A
CountryUS
Kind codeB2
Filing dateJul 24, 2017
Priority dateJul 22, 2016
Publication dateApr 21, 2020
Grant dateApr 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods that efficiently and effectively generate an enhanced document image of a displayed document in an image frame captured from a live image feed are disclosed. For example, systems and methods described herein apply a document enhancement process to a displayed document in an image frame that result in an enhanced document image that is cropped, rectified, un-shadowed, and with dark text against a mostly white background. Additionally, systems and method described herein determine whether a stored digital content item includes a displayed document. In response to determining that a stored digital content item does include a displayed document, systems and methods described herein generate an enhanced document image of a displayed document included in the stored digital content item.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing device comprising: at least one processor; and a non-transitory computer-readable medium storing instructions thereon that, when executed by the at least one processor, cause the computing device to: provide a graphical user interface comprising a live camera image feed in response to a user selection of a first option of a set of selectable options, the set of selectable options comprising the first option for scanning a document to a cloud-computing environment and a second option for uploading a file to the cloud-computing environment; detect, within the live camera image feed, a displayed document as a visual representation of a physical document; in response to detecting the displayed document within the live camera image feed and prior to an image frame capture, provide for display, within the graphical user interface, a live document boundary indicator associated with the displayed document within the live camera image feed; detect a user interaction with the graphical user interface while providing the live document boundary indicator associated with the displayed document; based on detecting the user interaction while providing the live document boundary indicator, capture from the live camera image feed an image frame that comprises the displayed document and excludes one or more portions displayed in the live camera image feed outside of the live document boundary indicator; process the image frame to generate, for upload to a user account in the cloud-computing environment, an enhanced document image corresponding to the displayed document within the live document boundary indicator; provide, for presentation on a display of the computing device, the enhanced document image; and convert the enhanced document image to a document file format. 2. The computing device as recited in claim 1 , wherein generating the enhanced document image comprises modifying the image frame with respect to the displayed document within the image frame. 3. The computing device as recited in claim 2 , wherein modifying the image frame comprises: detecting, without receiving user input and based on the live document boundary indicator, portions of the image frame that are not part of the displayed document; and cropping the image frame to remove the portions of the image frame that are not part of the displayed document. 4. The computing device as recited in claim 3 , wherein processing the image frame to generate the enhanced document image further comprises altering the displayed document within the cropped imaged frame. 5. The computing device of claim 4 , wherein altering the displayed document comprises at least one of: rectifying the displayed document, converting the displayed document to grayscale, or denoising the displayed document. 6. The computing device as recited in claim 4 , wherein altering the displayed document comprises correcting a background of the displayed document. 7. The computing device as recited in claim 6 , wherein correcting the background of the displayed document comprises: creating a subsampled version of the displayed document; and optimizing the subsampled version of the displayed document by solving an objective function that penalizes deviations from white within the subsampled version and penalizes deviations in gradient within the subsampled version to generate an optimized subsampled version. 8. The computing device as recited in claim 7 , wherein the non-transitory computer-readable medium further comprises instructions thereon that, when executed by the at least one processor, cause the computing device to: perform a Fourier Domain transfer of the subsampled version of the displayed document; solve the objective function in the Fourier Domain; and perform an inverse Fourier Domain transfer to generate the optimized subsampled version of the displayed document. 9. The computing device as recited in claim 7 , wherein the non-transitory computer-readable medium further comprises instructions thereon that, when executed by the at least one processor, cause the computing device to upsample the optimized subsampled version of the displayed document to generate a tri-map version of the displayed document that identifies background pixels, foreground pixels, and unknown pixels. 10. The computing device as recited in claim 9 , wherein the non-transitory computer-readable medium further comprises instructions thereon that, when executed by the at least one processor, cause the computing device to assign each of the unknown pixels as either a background pixel or a foreground pixel by estimating a background color of each of the unknown pixels. 11. A non-transitory computer-readable medium storing instructions thereon that, when executed by at least one processor, cause a computer system to: provide a graphical user interface comprising a live camera image feed in response to a user selection of a first option of a set of selectable options, the set of selectable options comprising the first option for scanning a document to a cloud-computing environment and a second option for uploading a file to the cloud-computing environment; detect, within the live camera image feed, a displayed document as a visual representation of a physical document; in response to detecting the displayed document within the live camera image feed and prior to an image frame capture, provide for display, within the graphical user interface, a live document boundary indicator associated with the displayed document within the live camera image feed; detect a user interaction with the graphical user interface while providing the live document boundary indicator associated with the displayed document; based on detecting the user interaction while providing the live document boundary indicator, capture from the live camera image feed an image frame that comprises the displayed document and excludes one or more portions displayed in the live camera image feed outside of the live document boundary indicator; process the image frame to generate, for upload to a user account in the cloud-computing environment, an enhanced document image corresponding to the displayed document within the live document boundary indicator; and provide, for presentation on a display of the computer system, the enhanced document image; and convert the enhanced document image to a document file format. 12. The non-transitory computer-readable medium recited in claim 11 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: receive user input indicating one or more edits to the enhanced document image; and modify the enhanced document image in accordance with the one or more edits. 13. The non-transitory computer-readable medium recited in claim 11 , wherein processing the image frame to generate the enhanced document image comprises altering a border of the displayed document to create a rectangular enhanced document image. 14. The non-transitory computer-readable medium recited in claim 11 , wherein generating the enhanced document image comprises: converting the displayed document from a color version to a grayscale version; and recoloring the displayed document prior to providing the enhanced document image. 15. A method comprising: receiving, at an online content management system and from a client device, a digital content item; determining, by at least one processor, that the digital content item comprises a displayed document; associating metadata that includes a digital tag or line item with the digital content it

Assignees

Inventors

Classifications

  • Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title

  • Display of layout of documents; Previewing · CPC title

  • Cropping · CPC title

  • Discrete and fast Fourier transform, [DFT, FFT] · CPC title

  • Artificial neural networks [ANN] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10628519B2 cover?
Systems and methods that efficiently and effectively generate an enhanced document image of a displayed document in an image frame captured from a live image feed are disclosed. For example, systems and methods described herein apply a document enhancement process to a displayed document in an image frame that result in an enhanced document image that is cropped, rectified, un-shadowed, and wit…
Who is the assignee on this patent?
Dropbox Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/24. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).