Who is the assignee on this patent?

Tyercha Edward-Robert, Kazmaier Gerrit Simon, Gildhoff Hinnerk, and 4 more

What technology area does this patent fall under?

Primary CPC classification G06F16/285. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Hilbert curve partitioning for parallelization of DBSCAN

US9703856B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9703856-B2
Application number	US-201414325258-A
Country	US
Kind code	B2
Filing date	Jul 7, 2014
Priority date	Jul 7, 2014
Publication date	Jul 11, 2017
Grant date	Jul 11, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

DBSCAN clustering analyses can be improved by pre-processing of a data set using a Hilbert curve to intelligently identify the centers for initial partitional analysis by a partitional clustering algorithm such as CLARANS. Partitions output by the partitional clustering algorithm can be process by DBSCAN running in parallel before intermediate cluster results are merged.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising a non-transitory machine-readable medium storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising: indexing a data set using a Hilbert curve that assigns a Hilbert distance to each of a plurality of data points in the data set; initiating a partitional clustering algorithm with parameters for a plurality of clusters identified using the indexing of the data set, the partitional clustering algorithm outputting the data set grouped into a plurality of partitions, the parameters comprising one or more cluster center values selected based on one or more sequences identified from the Hilbert distances; determining the one or more cluster center values, the determining comprising grouping the Hilbert distances for the plurality of points into the plurality of clusters, identifying one or more sequences of Hilbert distances as members of a cluster of the plurality of clusters, and choosing a cluster center value for the cluster based on a median of the Hilbert distances in the cluster; constraining the partitional clustering algorithm to randomly choose a new cluster center value for the cluster with a greater preference for the new cluster value to be within the grouped Hilbert distances for the cluster; processing the plurality of partitions, the processing comprising running a DBSCAN algorithm on the partitions in parallel to generate intermediate cluster results for each partition; and generating a final result, the generating comprising merging the intermediate results. 2. The computer program product of claim 1 , wherein the operations further comprise merging two or more of the plurality of clusters prior to the initiating of the partitional clustering algorithm, the merging comprising identifying data points in adjacent cells of the indexed data set. 3. The computer program product of claim 1 , wherein the partitional clustering algorithm comprises a CLARANS algorithm. 4. A system comprising: computer hardware configured to perform operations comprising: indexing a data set using a Hilbert curve that assigns a Hilbert distance to each of a plurality of data points in the data set; initiating a partitional clustering algorithm with parameters for a plurality of clusters identified using the indexing of the data set, the partitional clustering algorithm outputting the data set grouped into a plurality of partitions, the parameters comprising one or more cluster center values selected based on one or more sequences identified from the Hilbert distances; determining the one or more cluster center values, the determining comprising grouping the Hilbert distances for the plurality of points into the plurality of clusters, identifying one or more sequences of Hilbert distances as members of a cluster of the plurality of clusters, and choosing a cluster center value for the cluster based on a median of the Hilbert distances in the cluster; constraining the partitional clustering algorithm to randomly choose a new cluster center value for the cluster with a greater preference for the new cluster value to be within the grouped Hilbert distances for the cluster; processing the plurality of partitions, the processing comprising running a DBSCAN algorithm on the partitions in parallel to generate intermediate cluster results for each partition; and generating a final result, the generating comprising merging the intermediate results. 5. The system of claim 4 , wherein the operations further comprise merging two or more of the plurality of clusters prior to the initiating of the partitional clustering algorithm, the merging comprising identifying data points in adjacent cells of the indexed data set. 6. The system of claim 4 , wherein the partitional clustering algorithm comprises a CLARANS algorithm. 7. The system of claim 4 , wherein the computer hardware comprises: a programmable processor; and a computer readable medium storing instructions that, when executed by the programmable processor, cause the programmable processor to perform at least some of the operations. 8. A computer implemented method comprising: indexing a data set using a Hilbert curve that assigns a Hilbert distance to each of a plurality of data points in the data set; initiating a partitional clustering algorithm with parameters for a plurality of clusters identified using the indexing of the data set, the partitional clustering algorithm outputting the data set grouped into a plurality of partitions, the parameters comprising one or more cluster center values selected based on one or more sequences identified from the Hilbert distances; determining the one or more cluster center values, the determining comprising grouping the Hilbert distances for the plurality of points into the plurality of clusters, identifying one or more sequences of Hilbert distances as members of a cluster of the plurality of clusters, and choosing a cluster center value for the cluster based on a median of the Hilbert distances in the cluster; constraining the partitional clustering algorithm to randomly choose a new cluster center value for the cluster with a greater preference for the new cluster value to be within the grouped Hilbert distances for the cluster; processing the plurality of partitions, the processing comprising running a DBSCAN algorithm on the partitions in parallel to generate intermediate cluster results for each partition; and generating a final result, the generating comprising merging the intermediate results. 9. The computer implemented method of claim 8 , further comprising merging two or more of the plurality of clusters prior to the initiating of the partitional clustering algorithm, the merging comprising identifying data points in adjacent cells of the indexed data set. 10. The computer implemented method of claim 8 , wherein the partitional clustering algorithm comprises a CLARANS algorithm. 11. The computer implemented method of claim 8 , wherein the indexing, the initiating, the processing, and the generating are performed by at least one system comprising computer hardware.

Assignees

Inventors

Classifications

G06F16/285Primary
Clustering or classification · CPC title
G06F16/221
Column-oriented storage; Management thereof · CPC title
G06F17/30598Primary
Physics · mapped topic
G06F17/30315
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 55017147

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9703856B2 cover?: DBSCAN clustering analyses can be improved by pre-processing of a data set using a Hilbert curve to intelligently identify the centers for initial partitional analysis by a partitional clustering algorithm such as CLARANS. Partitions output by the partitional clustering algorithm can be process by DBSCAN running in parallel before intermediate cluster results are merged.
Who is the assignee on this patent?: Tyercha Edward-Robert, Kazmaier Gerrit Simon, Gildhoff Hinnerk, and 4 more
What technology area does this patent fall under?: Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).