Systems and methods for generating composite images of long documents using mobile video data

US9747504B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9747504-B2
Application numberUS-201615191442-A
CountryUS
Kind codeB2
Filing dateJun 23, 2016
Priority dateNov 15, 2013
Publication dateAug 29, 2017
Grant dateAug 29, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for capturing long document images and generating composite images therefrom include: detecting a document depicted in image data; tracking a position of the detected document within the image data; selecting a plurality of images, wherein the selection is based at least in part on the tracked position of the detected document; and generating a composite image based on at least one of the selected plurality of images. The tracking and selection are optionally but preferably based in whole or in part on motion vectors estimated at least partially based on analyzing image data such as test and reference frames within the captured video data/images. Corresponding systems and computer program products are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising a non-transitory computer readable medium having stored thereon instructions executable by a processor of a mobile device, the instructions being configured to cause the processor, upon execution thereof, to generate a composite image of a long document with sufficient resolution for downstream processing by: detecting a long document depicted in image data; tracking a position of the detected long document within the image data; selecting a plurality of images, wherein the selection is based at least in part on the tracked position of the detected long document; and generating a composite image of the long document based on at least two of the selected plurality of images, wherein the composite image of the long document is characterized by a resolution greater than a resolution of any of the selected plurality of images, wherein the resolution of the composite image is at least about 200 dots per inch (DPI) or at least about 200 pixels per inch (PPI). 2. The computer program product as recited in claim 1 , further comprising instructions configured to cause the processor to identify at least one edge of the document depicted in the image data. 3. The computer program product as recited in claim 1 , wherein each of the selected plurality of images depicts a portion of the document, and wherein the composite image depicts an entirety of the document. 4. The computer program product as recited in claim 1 , wherein the tracking comprises generating, using the processor, alignment hypotheses between at least some of the plurality of frames of image data, wherein the alignment hypotheses are generated based on matching sampled features between frames of the image data. 5. The computer program product as recited in claim 1 , further comprising instructions configured to cause the processor to: estimate one or more motion vectors corresponding to motion of an image capture component used to capture the image data. 6. The computer program product as recited in claim 5 , wherein the selection is further based at least in part on the one or more estimated motion vector. 7. The computer program product as recited in claim 5 , wherein the tracking is based exclusively on the estimated motion vector(s). 8. The computer program product as recited in claim 5 , further comprising instructions configured to cause the processor to: determine at least one motion displacement based on some or all of the estimated motion vector(s); either terminate or pause a capture operation in response to determining one of the motion displacement(s) is characterized by a value exceeding a predefined motion displacement threshold; and either initiate a new capture operation in response to terminating the capture operation; or resume the capture operation in response to pausing the capture operation. 9. The computer program product as recited in claim 1 , further comprising instructions configured to cause the processor to: identify, based on the composite image, one or more portions of the document depicting textual information; classify each identified portion of the document based on the textual information depicted therein; determine whether each classified portion is relevant to a financial transaction or irrelevant to the financial transaction, the determination being based on the portion classification; and remove each portion determined to be irrelevant to the financial transaction from the composite image. 10. The computer program product as recited in claim 1 , wherein generating the composite image comprises: estimating a homograph transform matrix or an affine transform matrix, wherein the estimation is based on text block matching between the selected plurality of images; and transforming one of the plurality of images to a coordinate system of another of the plurality of images using the homograph transform matrix or the affine transform matrix. 11. The computer program product as recited in claim 1 , the instructions configured to cause the processor to select the plurality of images further comprising instructions configured to cause the processor to define at least one frame pair, wherein each frame pair consists of a reference frame and a test frame, and wherein each reference frame and each test frame are selected from the image data. 12. The computer program product as recited in claim 11 , the instructions configured to cause the processor to generate the composite image further comprising instructions configured to cause the processor to: detect a skew angle in one or more of the reference frame and the test frame of at least one of the frame pairs, the skew angle corresponding to the document and having a magnitude of >0.0 degrees; and correct the skew angle in at least one of the reference frame and the test frame, wherein the document depicted in the composite image is characterized by a skew angle of approximately 0.0 degrees. 13. The computer program product as recited in claim 11 , the instructions configured to cause the processor to select the plurality of images further comprising instructions configured to cause the processor to: determine an amount of overlap between the reference frame and the test frame of at least one frame pair; and select an image corresponding to at least one frame pair for which the amount of overlap between the reference frame and the test frame is greater than a predetermined overlap threshold. 14. The computer program product as recited in claim 13 , wherein the amount of overlap corresponds to the document; and wherein the predetermined overlap threshold is a distance of at least 40% of a length of the reference frame. 15. The computer program product as recited in claim 11 , the instructions configured to cause the processor to generate the composite image further comprising instructions configured to cause the processor to: detect textual information in each of the reference frame and the test frame of at least one frame pair, the textual information being depicted in the document. 16. The computer program product as recited in claim 15 , the instructions configured to cause the processor to detect textual information in each of the reference frame and the test frame of at least one frame pair further comprising instructions configured to cause the processor to: define, in the reference frame, at least one rectangular portion of the document depicting some or all of the textual information; define, in the test frame, at least one corresponding rectangular portion of the document depicting some or all of the textual information; and align the document depicted in the test frame with the document depicted in the reference frame. 17. The computer program product as recited in claim 16 , wherein the textual information comprises at least one feature selected from a group consisting of: an identity of one or more characters represented in the rectangular portion; an identity of one or more characters represented in the corresponding rectangular portion; a sequence of characters represented in the rectangular portion; a sequence of characters represented in the corresponding rectangular portion; a position of one or more characters represented in the rectangular portion; a position of one or more characters represented in the corresponding rectangular portion; an absolute size of one or more characters represented in the rectangular portion; an absolute size of one or more characters represented in the corresponding rectangular portion a size

Assignees

Inventors

Classifications

  • Movement estimation (for video coding H04N19/51) · CPC title

  • Region-based segmentation · CPC title

  • Creating or editing images; Combining images with text · CPC title

  • using payment protocols involving electronic receipts · CPC title

  • Image subtraction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9747504B2 cover?
Techniques for capturing long document images and generating composite images therefrom include: detecting a document depicted in image data; tracking a position of the detected document within the image data; selecting a plurality of images, wherein the selection is based at least in part on the tracked position of the detected document; and generating a composite image based on at least one o…
Who is the assignee on this patent?
Kofax Inc
What technology area does this patent fall under?
Primary CPC classification H04N5/265. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 29 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).