Multi-distance clustering

US2016283533A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016283533-A1
Application numberUS-201514669792-A
CountryUS
Kind codeA1
Filing dateMar 26, 2015
Priority dateMar 26, 2015
Publication dateSep 29, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and other embodiments associated with multi-distance clustering are described. In one embodiment, a method includes reading a multi-distance similarity matrix S that records pair-wise multi-distance similarities between respective pairs of data points in a data set. Each pair-wise similarity is based on distances between a pair of data points calculated using K different distance functions, where K is greater than one. The method includes clustering the data points in the data set into n clusters based on the similarity matrix S. The number of clusters n is not determined prior to the clustering.

First claim

Opening claim text (preview).

What is claimed is: 1 . A non-transitory computer storage medium storing computer-executable instructions that when executed by a computer cause the computer to perform corresponding functions, the functions comprising: reading a multi-distance similarity matrix S that records pair-wise multi-distance similarities between respective pairs of data points in a data set, where each pair-wise similarity is based on distances between a pair of data points calculated using K different distance functions, where K is greater than one; clustering the data points in the data set into n clusters based on the similarity matrix S; and where n is not determined prior to the clustering. 2 . The non-transitory computer storage medium of claim 1 , where the functions comprise clustering the data points in the data set by, until no un-clustered data points remain: selecting a pair of data points having a relatively large multi-distance similarity as recorded in the similarity matrix S; and creating a cluster that includes the selected pair of data points by adding data points to the cluster that are similar to any point in the cluster. 3 . The non-transitory computer storage medium of claim 1 , where the functions comprise clustering the data set by: iteratively partitioning the similarity matrix S into n sub-matrices using spectral theory, where each sub-matrix corresponds to a cluster; and ceasing partitioning when all sub-matrices are mutually dissimilar. 4 . The non-transitory computer storage medium of claim 1 , where the functions comprise iteratively clustering the data set by, starting with the similarity matrix as a sub-matrix: clustering the sub-matrix by: using an objective function to compute a Laplacian matrix of the sub-matrix; computing eigenvalues and corresponding eigenvectors for the Laplacian matrix and ordering the eigenvalues in ascending order such that the first eigenvalue is equal to zero; identifying m eigenvalues that are equal to zero; and when m is greater than one, partitioning the sub-matrix into m sub-matrices based on the second through the m th eigenvectors; and clustering each of the resulting m sub-matrices. 5 . The non-transitory computer storage medium of claim 4 , where the functions comprise, when a sub-matrix has a single eigenvalue equal to zero: partitioning indices of the sub-matrix into two sub-matrices based on the second eigenvector, such that one of the two sub-matrices contains data vectors with indices corresponding to elements of the second eigenvector that indicate similarity and the other of the two sub-matrices contains data vectors with indices corresponding to elements of the second eigenvector that indicate dissimilarity; determining a cross-cluster similarity between the two sub-matrices; retaining the two sub-matrices when the cross-cluster similarity indicates dissimilarity; and discarding the two sub-matrices when the cross-cluster similarity indicates that the two sub-matrices are similar. 6 . The non-transitory computer storage medium of claim 1 , where the functions comprise computing each pairwise similarity in the similarity matrix S by: using a K different distance functions D 1 -D K , calculating K per-distance tri-point arbitration similarities S D1 -S DK between a pair of data points x i and x j with respect to an arbiter point a; and computing a multi-distance tri-point arbitration similarity S between the data points by: determining that the data points are similar when a dominating number of the K per-distance tri-point arbitration similarities indicate that the data points are similar; and determining that the data points are dissimilar when a dominating number of the K per-distance tri-point arbitration similarities indicate that the data points are dissimilar. 7 . The non-transitory computer storage medium of claim 6 , where the functions comprise computing the per-distance tri-point similarity between points x 1 and x 2 with respect to arbiter a based on the following relationship, where ρ is the distance between points using the respective distance function: S D  ( x 1 , x 2  a )  =  min   { ρ  ( x 1 , a ) ,  ρ  ( x 2 , a ) } - ρ  ( x 1 , x 2 ) max   { p  ( x 1 , x 2 ) , min 

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016283533A1 cover?
Systems, methods, and other embodiments associated with multi-distance clustering are described. In one embodiment, a method includes reading a multi-distance similarity matrix S that records pair-wise multi-distance similarities between respective pairs of data points in a data set. Each pair-wise similarity is based on distances between a pair of data points calculated using K different dista…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).