κ-selection using parallel processing
US-10649770-B2 · May 12, 2020 · US
US10949467B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10949467-B2 |
| Application number | US-201816044286-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 24, 2018 |
| Priority date | Mar 1, 2018 |
| Publication date | Mar 16, 2021 |
| Grant date | Mar 16, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
System and method of generating an index structure for indexing a plurality of unstructured data objects, including: generating a set of compact feature vectors, the set including a compact feature vector for each of the data objects, the compact feature vector for each data object including a sequence of hashed values that represent the data object; generating a plurality of twisted compact feature vector sets for each of set of compact feature vectors, each of the twisted compact feature vector sets being generated by applying a respective random shuffling permutation to the set of compact feature vectors; and for each twisted compact feature vector set, generating an index for the data objects in which the data objects are slotted based on sequences of hashed values in the twisted compact feature vector set.
Opening claim text (preview).
The invention claimed is: 1. A method of generating a searchable index structure that indexes a plurality of data objects, comprising: for each data object: generating a compact feature vector for the data object, the compact feature vector including a sequence of hash values that represent the data object; shuffling the sequence of hash values included in the compact feature vector using a plurality of shuffling permutations to generate a plurality of shuffled sequences for the data object, each shuffled sequence for the data object including the hash values of the compact feature vector shuffled according to a respective one of the shuffling permutations; and indexing, based on the shuffled sequences, the data object in a plurality of index tables that each correspond to the respective one of the shuffling permutations; and storing the plurality of index tables as the searchable index structure which is searchable for candidate data objects that are similar to a query object using a shuffled query sequence of the query object generated using the shuffling permutations. 2. The method of claim 1 wherein each of the shuffling permutations is a random shuffling permutation that specifies a random order for the hash values of a respective shuffled sequence. 3. The method of claim 2 wherein the hash values are binary values, and each shuffling permutation includes a randomly generated sequence of shuffling values that each specify a sequence location for the hash values in the respective shuffled sequence. 4. The method of claim 1 further comprising: for each data object, performing feature extraction on the data object to generate a raw feature vector including a plurality of feature values of the data object; and wherein for each data object, generating the compact feature vector comprises hashing the raw feature vector for the data object, to generate the sequence of hash values that represent the data object. 5. The method of claim 4 wherein the hashing is a locality sensitive hashing (LSH) using approximate nearest neighbour (ANN) hashing functions. 6. The method of claim 1 wherein: the index table corresponding to each shuffling permutation is a tree structure comprising d-nodes and k-nodes; each d-node includes an array of slots each having a respective slot ID, at least some of the slots occupied with a pointer for either a k-node associated with the slot or a next level d-node; and each k-node includes a pointer for a corresponding one of the data objects, at least some of the k-nodes also including a pointer for a further k-node. 7. The method of claim 6 wherein, for each index table, each k-node is associated with a slot of a root d-node based on a first subsequence of the shuffled sequence for the k-node's corresponding data object generated using the shuffling permutation that the index table corresponds to. 8. The method of claim 7 wherein, for each index table, when a number of k-nodes associated with a slot of the root d-node exceeds a threshold, a next level d-node is added in the index table and associated with the slot of the root d-node, and each k-node associated with the slot of the root d-node is then associated with a slot of the next level d-node based on a second subsequence of the shuffled sequence for the k-node's corresponding data object generated using the shuffling permutation that the index table corresponds to. 9. The method of claim 1 further comprising performing a search of the plurality of data objects by: generating a compact query feature vector for a query object, the compact query feature vector including a sequence of hash values that represent the query object; shuffling the sequence of hash values using the plurality of shuffling permutations to generate a plurality of shuffled query sequences for the query object; and searching each index table based on the shuffled query sequence generated using the shuffling permutation that corresponds to the index table to identify candidate data objects that are similar to the query object. 10. A system for generating searchable index structure that indexes a plurality of data objects, comprising: one or more processing units; a system storage device coupled to each of the one or more processing units, the system storage device tangibly storing thereon executable instructions that, when executed by the one or more processing units, cause the one or more processing units to: generate a plurality of shuffling permutations that are each associated with a respective index table; for each data object in the plurality of data objects: generate a compact feature vector for the data object, the compact feature vector including a sequence of hash values that represent the data object, shuffle the sequence of hash values included in the compact feature vector using a plurality of shuffling permutations to generate a plurality of shuffled sequences for the data object, each shuffled sequence for the data object including the hash values of the compact feature vector shuffled according to a respective one of the shuffling permutations, and index, based on the shuffled sequences, the data object in a plurality of index tables that each correspond to a respective one of the shuffling permutations; and store, in the system storage device, the plurality of index tables as the searchable index structure which is searchable for candidate data objects that are similar to a query object using a shuffled query sequence of the query object generated using the shuffling permutations. 11. The system of claim 10 wherein each of the shuffling permutations is a random shuffling permutation that specifies a random order for the hash values of a respective shuffled sequence. 12. The system of claim 11 wherein the hash values are binary values, and each shuffling permutation includes a randomly generated sequence of shuffling values that each specify a sequence location for the hash values in the respective shuffled sequence. 13. The system claim 10 wherein the executable instructions, when executed by the one or more processing units, further cause the one or more processing units to: for each data object, perform feature extraction on the data object to generate a raw feature vector including a plurality of feature values of the data obiect; and wherein for each data object, the compact feature vector for the data object is generated by hashing the raw feature vector for the data object to generate the sequence of hash values. 14. The system of claim 13 wherein the hashing is a locality sensitive hashing (LSH) using approximate nearest neighbour (ANN) hashing functions. 15. The system of claim 10 wherein: each index table is a tree structure comprising d-nodes and k-nodes; each d-node includes an array of slots each having a respective slot ID, at least some of the slots occupied with a pointer for either a k-node associated with the slot or a next level d-node; and each k-node includes a pointer for a corresponding one of the data objects, at least some of the k-nodes also including a pointer for a further k-node. 16. The system of claim 15 wherein, for each index table, each k-node is associated with a slot of a root d-node based on a first subsequence of the shuffled sequence for the k-node's corresponding data object generated using the shuffling permutation associated with the index table. 17. The system of claim 16 wherein the executable instructions, when executed by the one or more processing units, further cause the one or more processing units to perform a s
Trees · CPC title
Proximity, similarity or dissimilarity measures · CPC title
Clustering; Classification · CPC title
Distances to closest patterns, e.g. nearest neighbour classification · CPC title
Matching criteria, e.g. proximity measures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.