Live document detection in a captured video stream
US-2018025251-A1 · Jan 25, 2018 · US
US10628519B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10628519-B2 |
| Application number | US-201715658289-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 24, 2017 |
| Priority date | Jul 22, 2016 |
| Publication date | Apr 21, 2020 |
| Grant date | Apr 21, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods that efficiently and effectively generate an enhanced document image of a displayed document in an image frame captured from a live image feed are disclosed. For example, systems and methods described herein apply a document enhancement process to a displayed document in an image frame that result in an enhanced document image that is cropped, rectified, un-shadowed, and with dark text against a mostly white background. Additionally, systems and method described herein determine whether a stored digital content item includes a displayed document. In response to determining that a stored digital content item does include a displayed document, systems and methods described herein generate an enhanced document image of a displayed document included in the stored digital content item.
Opening claim text (preview).
What is claimed is: 1. A computing device comprising: at least one processor; and a non-transitory computer-readable medium storing instructions thereon that, when executed by the at least one processor, cause the computing device to: provide a graphical user interface comprising a live camera image feed in response to a user selection of a first option of a set of selectable options, the set of selectable options comprising the first option for scanning a document to a cloud-computing environment and a second option for uploading a file to the cloud-computing environment; detect, within the live camera image feed, a displayed document as a visual representation of a physical document; in response to detecting the displayed document within the live camera image feed and prior to an image frame capture, provide for display, within the graphical user interface, a live document boundary indicator associated with the displayed document within the live camera image feed; detect a user interaction with the graphical user interface while providing the live document boundary indicator associated with the displayed document; based on detecting the user interaction while providing the live document boundary indicator, capture from the live camera image feed an image frame that comprises the displayed document and excludes one or more portions displayed in the live camera image feed outside of the live document boundary indicator; process the image frame to generate, for upload to a user account in the cloud-computing environment, an enhanced document image corresponding to the displayed document within the live document boundary indicator; provide, for presentation on a display of the computing device, the enhanced document image; and convert the enhanced document image to a document file format. 2. The computing device as recited in claim 1 , wherein generating the enhanced document image comprises modifying the image frame with respect to the displayed document within the image frame. 3. The computing device as recited in claim 2 , wherein modifying the image frame comprises: detecting, without receiving user input and based on the live document boundary indicator, portions of the image frame that are not part of the displayed document; and cropping the image frame to remove the portions of the image frame that are not part of the displayed document. 4. The computing device as recited in claim 3 , wherein processing the image frame to generate the enhanced document image further comprises altering the displayed document within the cropped imaged frame. 5. The computing device of claim 4 , wherein altering the displayed document comprises at least one of: rectifying the displayed document, converting the displayed document to grayscale, or denoising the displayed document. 6. The computing device as recited in claim 4 , wherein altering the displayed document comprises correcting a background of the displayed document. 7. The computing device as recited in claim 6 , wherein correcting the background of the displayed document comprises: creating a subsampled version of the displayed document; and optimizing the subsampled version of the displayed document by solving an objective function that penalizes deviations from white within the subsampled version and penalizes deviations in gradient within the subsampled version to generate an optimized subsampled version. 8. The computing device as recited in claim 7 , wherein the non-transitory computer-readable medium further comprises instructions thereon that, when executed by the at least one processor, cause the computing device to: perform a Fourier Domain transfer of the subsampled version of the displayed document; solve the objective function in the Fourier Domain; and perform an inverse Fourier Domain transfer to generate the optimized subsampled version of the displayed document. 9. The computing device as recited in claim 7 , wherein the non-transitory computer-readable medium further comprises instructions thereon that, when executed by the at least one processor, cause the computing device to upsample the optimized subsampled version of the displayed document to generate a tri-map version of the displayed document that identifies background pixels, foreground pixels, and unknown pixels. 10. The computing device as recited in claim 9 , wherein the non-transitory computer-readable medium further comprises instructions thereon that, when executed by the at least one processor, cause the computing device to assign each of the unknown pixels as either a background pixel or a foreground pixel by estimating a background color of each of the unknown pixels. 11. A non-transitory computer-readable medium storing instructions thereon that, when executed by at least one processor, cause a computer system to: provide a graphical user interface comprising a live camera image feed in response to a user selection of a first option of a set of selectable options, the set of selectable options comprising the first option for scanning a document to a cloud-computing environment and a second option for uploading a file to the cloud-computing environment; detect, within the live camera image feed, a displayed document as a visual representation of a physical document; in response to detecting the displayed document within the live camera image feed and prior to an image frame capture, provide for display, within the graphical user interface, a live document boundary indicator associated with the displayed document within the live camera image feed; detect a user interaction with the graphical user interface while providing the live document boundary indicator associated with the displayed document; based on detecting the user interaction while providing the live document boundary indicator, capture from the live camera image feed an image frame that comprises the displayed document and excludes one or more portions displayed in the live camera image feed outside of the live document boundary indicator; process the image frame to generate, for upload to a user account in the cloud-computing environment, an enhanced document image corresponding to the displayed document within the live document boundary indicator; and provide, for presentation on a display of the computer system, the enhanced document image; and convert the enhanced document image to a document file format. 12. The non-transitory computer-readable medium recited in claim 11 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: receive user input indicating one or more edits to the enhanced document image; and modify the enhanced document image in accordance with the one or more edits. 13. The non-transitory computer-readable medium recited in claim 11 , wherein processing the image frame to generate the enhanced document image comprises altering a border of the displayed document to create a rectangular enhanced document image. 14. The non-transitory computer-readable medium recited in claim 11 , wherein generating the enhanced document image comprises: converting the displayed document from a color version to a grayscale version; and recoloring the displayed document prior to providing the enhanced document image. 15. A method comprising: receiving, at an online content management system and from a client device, a digital content item; determining, by at least one processor, that the digital content item comprises a displayed document; associating metadata that includes a digital tag or line item with the digital content it
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
Display of layout of documents; Previewing · CPC title
Cropping · CPC title
Discrete and fast Fourier transform, [DFT, FFT] · CPC title
Artificial neural networks [ANN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.