Duplicate/near duplicate detection and image registration

US9530072B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9530072-B2
Application numberUS-201313888082-A
CountryUS
Kind codeB2
Filing dateMay 6, 2013
Priority dateMar 15, 2013
Publication dateDec 27, 2016
Grant dateDec 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for detecting duplicate and near duplicate images. An exemplary method includes receiving an original image, preparing the image for fingerprinting, and calculating an image fingerprint, the fingerprint expressed as a sequence of numbers. The method further includes comparing the image fingerprint thus obtained with a set of previously stored fingerprints obtained from a set of previously stored images, and determining if the original image is either a duplicate or a near duplicate of an image in the set if the dissimilarity between the two fingerprints is less than a defined threshold T. Once a duplicate or near duplicate is detected, various defined actions may be taken, including culling the less desirable image or referring the redundancy to a user.

First claim

Opening claim text (preview).

What is claimed: 1. A method of detecting duplicate and near duplicate images, comprising: receiving an image; generating a first cell array for the image comprising a first grid of cells corresponding to regions of the image, the first grid of cells comprising average pixel intensity values for corresponding regions of the image; rotating the image based on the average pixel intensity values of the first grid of cells; generating a second cell array for the rotated image comprising a second grid of cells corresponding to regions of the rotated image, the second grid of cells comprising numeric values for corresponding regions of the rotated image; generating an image fingerprint for the rotated image, the image fingerprint comprising a sequence of the numeric values for the rotated image; identifying one or more duplicate or near duplicate images from a set of previously stored images by comparing the image fingerprint with a set of previously generated image fingerprints corresponding to the set of previously stored images, wherein comparing the image fingerprint with the set of previously generated image fingerprints comprises comparing the sequence of numeric values with sequences of numeric values of the previously generated image fingerprints; and in response to identifying one or more duplicate or near duplicate images from the set of previously stored images, taking a defined action with respect to the one or more duplicate or near duplicate images. 2. The method of claim 1 , further comprising preparing the image for generating the first cell array, wherein preparing the image for generating the first cell array comprises resizing the image to a defined size. 3. The method of claim 1 , wherein rotating the image based on the average pixel intensity values comprises rotating the image such that the cells of the first grid of cells associated with the highest pixel intensity values are at the top of the image. 4. The method of claim 1 , wherein the first grid of cells comprises a 2×2 grid of four cells, a 3×3 grid of nine cells, or a 4×4 grid of sixteen cells. 5. The method of claim 1 , wherein the average pixel intensity values are calculated by averaging one or more of lightness, brightness, intensity and value across all pixels within each cell of the first grid of cells. 6. The method of claim 1 , wherein the sequence of numeric values for the rotated image comprises a sequence of binary values. 7. The method of claim 3 , wherein the second cell array comprises an 8×8 grid of sixty-four cells. 8. The method of claim 1 , wherein taking the defined action comprises: choosing only one of the duplicates or near duplicates to save, and merging the metadata from both, and a history of the duplicate detection and actions taken, into the saved image's record. 9. The method of claim 1 , further comprising preparing the image for generating the first cell array, wherein preparing the image for generating the first cell array comprises: identifying a skew angle of the image; and correcting the skew angle of the image prior to generating the first cell array for the image. 10. The method of claim 1 , wherein the second cell array comprises a finer granularity of cells than the first cell array. 11. The method of claim 1 , wherein the second grid of cells comprises numeric values corresponding to the average pixel intensity values for the corresponding regions of the rotated image. 12. The method of claim 11 , wherein the numeric values corresponding to average pixel intensity values for the corresponding regions of the rotated image comprises binary values for each of the corresponding regions of the rotated image. 13. The method of claim 1 , wherein identifying one or more duplicate or near duplicate images from the set of previously stored images comprises identifying one or more of the previously generated image fingerprints having greater than or equal to a threshold number of identical numeric values as the image fingerprint for the rotated image. 14. A non-transitory computer readable medium containing instructions that, when executed by at least one processor of a computing device, cause the computing device to: receive an image; generate a first cell array for the image comprising a first grid of cells corresponding to regions of the image, the first grid of cells comprising average pixel intensity values for corresponding regions of the image; rotate the image based on the average pixel intensity values of the first grid of cells; generate a second cell array for the rotated image comprising a second grid of cells corresponding to regions of the rotated image, the second grid of cells comprising numeric values for corresponding regions of the rotated image; generate an image fingerprint for the rotated image, the image, fingerprint comprising a sequence of the numeric values for the rotated image; identify one or more duplicate or near duplicate images from a set of previously stored images by comparing the image fingerprint with a set of previously generated image fingerprints corresponding to the set of previously stored images, wherein comparing the image fingerprint with the set of previously generated image fingerprints comprises comparing the sequence of numeric values with sequences of numeric values of the previously generated image fingerprints; and in response to identifying one or more duplicate or near duplicate images from the set of previously stored images, take a defined action with respect to the one or more duplicate or near duplicate images. 15. The non-transitory computer readable medium of claim 14 , wherein the instructions further cause the device to prepare the image for generating the first cell array, wherein preparing the image for generating the first cell array comprises resizing the image to a defined size. 16. The non-transitory computer readable medium of claim 14 , wherein rotating the image based on the average pixel intensity values comprises rotating the image such that the cells of the first grid of cells associated with the highest pixel intensity values are at the top of the image. 17. The non-transitory computer readable medium of claim 14 , wherein the first grid of cells comprises a 2×2 grid of four cells, a 3×3 grid of nine cells, or a 4×4 grid of sixteen cells. 18. The non-transitory computer readable medium of claim 14 , wherein the average pixel intensity values are calculated by averaging one or more of lightness, brightness, intensity and value across all pixels within each cell of the first grid of cells. 19. The non-transitory computer readable medium of claim 14 , wherein the sequence of numeric values for the rotated image comprises a sequence of binary values. 20. The non-transitory computer readable medium of claim 17 , wherein the second cell array comprises an 8×8 grid of sixty-four cells. 21. The non-transitory computer readable medium of claim 14 , wherein taking the defined action comprises: choosing only one of the duplicates or near duplicates to save, and merging the metadata from both, and a history of the duplicate detection and actions taken, into the saved image's record. 22. The non-transitory computer readable storage medium of claim 14 , wherein the instructions further cause the device to prepare the image for generating the first cell array, wherein preparing the image for generating the first cell array comprises: identifying a skew angle of the image; and correcting the skew an

Assignees

Inventors

Classifications

  • Matching criteria, e.g. proximity measures · CPC title

  • G06V10/50Primary

    by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis · CPC title

  • using metadata automatically derived from the content · CPC title

  • using colour · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9530072B2 cover?
Embodiments are disclosed for detecting duplicate and near duplicate images. An exemplary method includes receiving an original image, preparing the image for fingerprinting, and calculating an image fingerprint, the fingerprint expressed as a sequence of numbers. The method further includes comparing the image fingerprint thus obtained with a set of previously stored fingerprints obtained from…
Who is the assignee on this patent?
Dropbox Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/50. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).