Simplified Hash Table
US-2024422006-A1 · Dec 19, 2024 · US
US2020142897A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020142897-A1 |
| Application number | US-201916671181-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 1, 2019 |
| Priority date | Nov 2, 2018 |
| Publication date | May 7, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A high-dimensional data nearest-neighbor query method based on variable-length hash codes is disclosed. Specifically, in this method, hash codes with the same code frequency are taken as a sub-data set, all the sub-data sets are ranked, a compression ratio is set for each sub-data set, the sub-data sets are compressed and trained according to the compression ratios, and hash codes and original codes corresponding to the trained sub-data sets are obtained; the hash code of each trained sub-data sets is copied to obtain multiple replicas, and the original codes and the corresponding replicas are strung to obtain strung hash codes which are integrated to form a final nearest-neighbor query table; and, a query code is obtained, and the nearest-neighbor query table is searched for a nearest-neighbor data set to complete query. The query efficiency and accuracy are greatly improved according to the invention.
Opening claim text (preview).
What is claimed is: 1 . A high-dimensional data nearest-neighbor query method based on variable-length hash codes, comprising: (1) obtaining an original high-dimensional data set including multiple pieces of original high-dimensional data, giving a query point, and carrying out low-dimensional mapping on the original high-dimensional data set to generate a random Fourier feature vector set consisting of random Fourier feature vectors corresponding to the original high-dimensional data; (2) carrying out encoding according to a hash value of each said random Fourier feature vector to obtain hash codes corresponding to the original high-dimensional data, counting the appearance frequency of each hash code of all the hash codes to obtain a code frequency reflecting the appearance frequency of the hash code, taking hash codes with a same code frequency as a sub-data set to obtain multiple sub-data sets, ranking all the sub-data sets from high to low according to the code frequencies to obtain serial numbers of the sub-data sets, setting a compression ratio for each said sub-data set in a manner that the compression ratio is a reciprocal of the code frequency of the sub-data set, compressing the sub-data sets according to the compression ratios to obtain compressed sub-data sets and code lengths of the compressed sub-data sets, and then training the compressed sub-data sets in a manner that the sum of a compression loss and a quantification loss is minimum to obtain trained sub-data sets and hash codes of the trained sub-data sets; (3) extracting a random Fourier feature of each said trained sub-data set to obtain an original code corresponding to the trained sub-data set, and copying the hash code of each said trained sub-data set according to a code length of the corresponding original code and the compression ratio corresponding to the original code to obtain multiple replicas of the hash code of the trained sub-data set; (4) stringing the original code of each said trained sub-data set and the replicas of the hash code of the trained sub-data set to obtain a strung hash code corresponding to the trained sub-data set, and integrating the strung hash codes corresponding to all the trained sub-data sets to form a final nearest-neighbor query table; and (5) extracting a random Fourier feature vector of the given query point, mapping the random Fourier feature vector of the given query point to a random Fourier code having a length consistent with that of the strung hash code corresponding to one said trained sub-data set, and using the random Fourier code as a query code corresponding to the query point; and finally, searching the final nearest-neighbor query table for a nearest-neighbor data set having a minimum Hamming distance to the query code corresponding to the query point, and using the nearest-neighbor data set as a nearest-neighbor query result of the given query point, so that the nearest-neighbor query process of the given query point is completed.
Hash tables · CPC title
Machine learning · CPC title
of query operations · CPC title
Fourier, Walsh or analogous domain transformations {, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms (for correlation function computation G06F17/156; spectrum analysers G01R23/16)} · CPC title
Multidimensional index structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.