Clustering technique for optimized search over high-dimensional space

US9864930B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9864930-B2
Application numberUS-201615016647-A
CountryUS
Kind codeB2
Filing dateFeb 5, 2016
Priority dateFeb 5, 2016
Publication dateJan 9, 2018
Grant dateJan 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach is provided in which a knowledge manager locates centroids in a high-dimensional vector space that are closest to a new image feature set and performs nearest neighbor searches on feature sets included in clusters corresponding to the located centroids. The knowledge manager then selects feature sets closest to the new image feature set based on the nearest neighbor searches and in turn, marks images corresponding to the selected closest features sets as similar images to a new image corresponding to the new image feature set.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method implemented by an information handling system that includes a memory and a processor, the method comprising: locating a plurality of centroids that are closest to a new image feature set of a new image in a high-dimensional vector space; in response to locating the plurality of centroids: identifying a plurality of member image feature sets included in a plurality of clusters corresponding to the located plurality of centroids; and performing one or more nearest neighbor searches, relative to the new image feature set, on the plurality of member image feature sets; selecting one or more of a plurality of member image feature sets that are nearest to the new image feature set in response to performing the one or more nearest neighbor searches; and marking one or more member images corresponding to the selected one or more member image feature sets as a similar image to the new image. 2. The method of claim 1 wherein, prior to locating the plurality of centroids, the method further comprises: creating the high-dimensional vector space, wherein the creating further comprises: mapping a plurality of training image feature sets to the high-dimensional vector space; and randomly selecting a set of the plurality of training image feature sets as training centroids, resulting in a set of training centroids. 3. The method of claim 2 wherein, for each of the plurality of training image feature sets, the method further comprises: selecting one of the plurality of training image feature sets; identifying one or more of the set of training centroids that are closest to the selected training image feature set; and assigning the selected training image feature set membership to each of the identified one or more closest centroids. 4. The method of claim 1 further comprising: assigning membership of the new image to each of the plurality of clusters. 5. The method of claim 1 wherein the set of centroids include a first centroid and a second centroid, and wherein the second centroid is closer to the new image feature set than the first centroid, and wherein at least one of the selected one or more member image feature sets is included in a first of the plurality of clusters that correspond to the first centroid. 6. The method of claim 5 wherein at least one of the selected one or more member image feature sets is included in a second of the plurality of clusters that correspond to the second centroid. 7. The method of claim 1 wherein the new image feature set encompasses a complete feature set of the image. 8. An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of: locating a plurality of centroids that are closest to a new image feature set of a new image in a high-dimensional vector space; in response to locating the plurality of centroids: identifying a plurality of member image feature sets included in a plurality of clusters corresponding to the located plurality of centroids; and performing one or more nearest neighbor searches, relative to the new image feature set, on the plurality of member image feature sets; selecting one or more of a plurality of member image feature sets that are nearest to the new image feature set in response to performing the one or more nearest neighbor searches; and marking one or more member images corresponding to the selected one or more member image feature sets as a similar image to the new image. 9. The information handling system of claim 8 wherein, prior to locating the plurality of centroids, at least one of the one or more processors perform additional actions comprising: creating the high-dimensional vector space, wherein the creating further comprises: mapping a plurality of training image feature sets to the high-dimensional vector space; and randomly selecting a set of the plurality of training image feature sets as training centroids, resulting in a set of training centroids. 10. The information handling system of claim 9 wherein, for each of the plurality of training image feature sets, at least one of the one or more processors perform additional actions comprising: selecting one of the plurality of training image feature sets; identifying one or more of the set of training centroids that are closest to the selected training image feature set; and assigning the selected training image feature set membership to each of the identified one or more closest centroids. 11. The information handling system of claim 8 wherein at least one of the one or more processors perform additional actions comprising: assigning membership of the new image to each of the plurality of clusters. 12. The information handling system of claim 8 wherein the set of centroids include a first centroid and a second centroid, and wherein the second centroid is closer to the new image feature set than the first centroid, and wherein at least one of the selected one or more member image feature sets is included in a first of the plurality of clusters that correspond to the first centroid. 13. The information handling system of claim 12 wherein at least one of the selected one or more member image feature sets is included in a second of the plurality of clusters that correspond to the second centroid. 14. The information handling system of claim 8 wherein the new image feature set encompasses a complete feature set of the image. 15. A computer program product stored in a computer readable storage medium, comprising computer program code that, when executed by an information handling system, causes the information handling system to perform actions comprising: locating a plurality of centroids that are closest to a new image feature set of a new image in a high-dimensional vector space; in response to locating the plurality of centroids: identifying a plurality of member image feature sets included in a plurality of clusters corresponding to the located plurality of centroids; and performing one or more nearest neighbor searches, relative to the new image feature set, on the plurality of member image feature sets; selecting one or more of a plurality of member image feature sets that are nearest to the new image feature set in response to performing the one or more nearest neighbor searches; and marking one or more member images corresponding to the selected one or more member image feature sets as a similar image to the new image. 16. The computer program product of claim 15 wherein, prior to locating the plurality of centroids, the information handling system performs additional actions comprising: creating the high-dimensional vector space, wherein the creating further comprises: mapping a plurality of training image feature sets to the high-dimensional vector space; and randomly selecting a set of the plurality of training image feature sets as training centroids, resulting in a set of training centroids. 17. The computer program product of claim 16 wherein, for each of the plurality of training image feature sets, the information handling system performs additional actions comprising: selecting one of the plurality of training image feature sets; identifying one or more of the set of training centroids that are closest to the selected training image feature set; and assigning the selected training image feature set membership to each of the identified one

Assignees

Inventors

Classifications

  • G06V10/763Primary

    Non-hierarchical techniques, e.g. based on statistics of modelling distributions · CPC title

  • Clustering techniques · CPC title

  • Distances to cluster centroïds · CPC title

  • with fixed number of clusters, e.g. K-means clustering · CPC title

  • Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9864930B2 cover?
An approach is provided in which a knowledge manager locates centroids in a high-dimensional vector space that are closest to a new image feature set and performs nearest neighbor searches on feature sets included in clusters corresponding to the located centroids. The knowledge manager then selects feature sets closest to the new image feature set based on the nearest neighbor searches and in …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06V10/763. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).