What technology area does this patent fall under?

Primary CPC classification G06V40/172. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Searching in multilevel clustered vector-based data

US11449704B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11449704-B2
Application number	US-202016744241-A
Country	US
Kind code	B2
Filing date	Jan 16, 2020
Priority date	Jan 16, 2020
Publication date	Sep 20, 2022
Grant date	Sep 20, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A multilevel clustered data set for multidimensional vectors is created by defining a plurality of clusters based on each of the signed dimensions of the vectors, each dimension functioning as an axis. Vectors are assigned to each cluster by measuring cosine similarity between a vector and each axis. Sub-clusters are defined as ranges of cosine similarity values within a cluster, and each vector is assigned into the appropriate range based on their cosine similarity value with the axis of the cluster. Searching for a matching vector to a new vector is efficiently achieved in near-constant time by measuring cosine similarity for the new vector with each axis to identify the closest cluster, reusing the cosine similarity of the new vector and axis to determine which sub-cluster corresponds to the appropriate range of values, and then comparing each vector within the sub-cluster until a match is found or ruled out.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method (CIM) comprising: receiving a clustered images data set, with the clustered images data set including a plurality of top-level clusters, where a given top-level cluster is determined based on a signed axis and includes a plurality of sub-clusters, where a given sub-cluster is a range of values based, at least in part, on the signed axis of the given top-level cluster and includes one or more multidimensional vectors generated from historical images; determining, from the plurality of top-level clusters, a subset of top-level clusters for removal based, at least in part, on relative similarity of the multidimensional vectors within the sub-clusters of the subset of top-level clusters compared to the multidimensional vectors within the sub-clusters of the other top-level clusters of the plurality of top-level clusters; removing the subset of top-level clusters from the plurality of top-level clusters; receiving an input image data set; generating a multidimensional vector based on the input image data set; determining a top-level cluster closest to the generated multidimensional vector based, at least in part, on the signed axes of the plurality of top-level clusters; determining a sub-cluster of the determined top-level cluster closest to the generated multidimensional vector based, at least in part, on the signed axis of the determined top-level cluster and the generated multidimensional vector; and determining a subset of one or more vectors of the determined sub-cluster as matches for the input image by comparing the generated multidimensional vector to one or more vectors of the determined sub-cluster. 2. The CIM of claim 1 , wherein: the vectors of the clustered images data set are based upon biometric facial image scans; and the input image data set is a biometric facial image scan. 3. The CIM of claim 1 , wherein: the clustered images data set includes 256 top-level clusters based on vectors with 128 dimensions; and each dimension includes a positive and negative sign. 4. The CIM of claim 1 , wherein determining the top-level cluster closest to the generated multidimensional vector includes: measuring a cosine similarity value between the signed axis of each top-level cluster and the generated multidimensional vector; and selecting the top-level cluster with the measured cosine similarity value closest or equal to 1. 5. The CIM of claim 4 , wherein the sub-clusters of a given top-level cluster of the clustered images data set are defined as ranges of cosine similarity values measured from the signed axis of the given top-level cluster and the one or more multidimensional vectors generated from historical images assigned to the top-level cluster. 6. The CIM of claim 5 , wherein determining the sub-cluster of the determined top-level cluster closest to the generated multidimensional vector includes: determining which sub-cluster is defined by a range of values which includes the measured cosine similarity value between the signed axis of each top-level cluster and the generated multidimensional vector. 7. The CIM of claim 6 , wherein the determining a subset of one or more vectors of the determined sub-cluster as matches for the input image by comparing the generated multidimensional vector to the plurality of vectors assigned to the determined sub-cluster includes: comparing the cosine similarity values of the generated multidimensional vector and the axis of the determined top-level cluster with the cosine similarity values of the axis of the determined top-level cluster and each vector in the determined sub-cluster; and determining a match for inclusion into the subset for each vector corresponding to a cosine similarity value within a predetermined threshold value of the cosine similarity value corresponding to the generated multidimensional vector. 8. A computer program product (CPP) comprising: a machine readable storage device; and computer code stored on the machine readable storage device, with the computer code including instructions for causing a processor(s) set to perform operations including the following: receiving a clustered images data set, with the clustered images data set including a plurality of top-level clusters, where a given top-level cluster is determined based on a signed axis and includes a plurality of sub-clusters, where a given sub-cluster is a range of values based, at least in part, on the signed axis of the given top-level cluster and includes one or more multidimensional vectors generated from historical images, determining, from the plurality of top-level clusters, a subset of top-level clusters for removal based, at least in part, on relative similarity of the multidimensional vectors within the sub-clusters of the subset of top-level clusters compared to the multidimensional vectors within the sub-clusters of the other top-level clusters of the plurality of top-level clusters, removing the subset of top-level clusters from the plurality of top-level clusters, receiving an input image data set, generating a multidimensional vector based on the input image data set, determining a top-level cluster closest to the generated multidimensional vector based, at least in part, on the signed axes of the plurality of top-level clusters, determining a sub-cluster of the determined top-level cluster closest to the generated multidimensional vector based, at least in part, on the signed axis of the determined top-level cluster and the generated multidimensional vector, and determining a subset of one or more vectors of the determined sub-cluster as matches for the input image by comparing the generated multidimensional vector to one or more vectors of the determined sub-cluster. 9. The CPP of claim 8 , wherein: the vectors of the clustered images data set are based upon biometric facial image scans; and the input image data set is a biometric facial image scan. 10. The CPP of claim 8 , wherein: the clustered images data set includes 256 top-level clusters based on vectors with 128 dimensions; and each dimension includes a positive and negative sign. 11. The CPP of claim 8 , wherein determining the top-level cluster closest to the generated multidimensional vector includes: measuring a cosine similarity value between the signed axis of each top-level cluster and the generated multidimensional vector; and selecting the top-level cluster with the measured cosine similarity value closest or equal to 1. 12. The CPP of claim 11 , wherein the sub-clusters of a given top-level cluster of the clustered images data set are defined as ranges of cosine similarity values measured from the signed axis of the given top-level cluster and the one or more multidimensional vectors generated from historical images assigned to the top-level cluster. 13. The CPP of claim 12 , wherein determining the sub-cluster of the determined top-level cluster closest to the generated multidimensional vector includes: determining which sub-cluster is defined by a range of values which includes the measured cosine similarity value between the signed axis of each top-level cluster and the generated multidimensional vector. 14. The CPP of claim 13 , wherein the determining a subset of one or more vectors of the determined sub-cluster as matches for the input image by comparing the generated multidimensional vector to the plurality of vectors assigned to the determined sub-cluster includes: comparing the cosine similarity values of the generated multidimensional vector and the axis of the determined top-level cluster with the cosine similarity val

Assignees

Inventors

Classifications

G06V10/7625
Hierarchical techniques, i.e. dividing or merging patterns to obtain a tree-like representation; Dendograms · CPC title
G06V10/771
Feature selection, e.g. selecting representative features from a multi-dimensional feature space · CPC title
G06F18/231
Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram · CPC title
G06V40/172Primary
Classification, e.g. identification · CPC title
G06F18/2113
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title

Patent family

Related publications grouped by family.

View patent family 76857902

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11449704B2 cover?: A multilevel clustered data set for multidimensional vectors is created by defining a plurality of clusters based on each of the signed dimensions of the vectors, each dimension functioning as an axis. Vectors are assigned to each cluster by measuring cosine similarity between a vector and each axis. Sub-clusters are defined as ranges of cosine similarity values within a cluster, and each vecto…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06V40/172. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).