High-dimensional data nearest-neighbor query method based on variable-length hash codes

US2020142897A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020142897-A1
Application numberUS-201916671181-A
CountryUS
Kind codeA1
Filing dateNov 1, 2019
Priority dateNov 2, 2018
Publication dateMay 7, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A high-dimensional data nearest-neighbor query method based on variable-length hash codes is disclosed. Specifically, in this method, hash codes with the same code frequency are taken as a sub-data set, all the sub-data sets are ranked, a compression ratio is set for each sub-data set, the sub-data sets are compressed and trained according to the compression ratios, and hash codes and original codes corresponding to the trained sub-data sets are obtained; the hash code of each trained sub-data sets is copied to obtain multiple replicas, and the original codes and the corresponding replicas are strung to obtain strung hash codes which are integrated to form a final nearest-neighbor query table; and, a query code is obtained, and the nearest-neighbor query table is searched for a nearest-neighbor data set to complete query. The query efficiency and accuracy are greatly improved according to the invention.

First claim

Opening claim text (preview).

What is claimed is: 1 . A high-dimensional data nearest-neighbor query method based on variable-length hash codes, comprising: (1) obtaining an original high-dimensional data set including multiple pieces of original high-dimensional data, giving a query point, and carrying out low-dimensional mapping on the original high-dimensional data set to generate a random Fourier feature vector set consisting of random Fourier feature vectors corresponding to the original high-dimensional data; (2) carrying out encoding according to a hash value of each said random Fourier feature vector to obtain hash codes corresponding to the original high-dimensional data, counting the appearance frequency of each hash code of all the hash codes to obtain a code frequency reflecting the appearance frequency of the hash code, taking hash codes with a same code frequency as a sub-data set to obtain multiple sub-data sets, ranking all the sub-data sets from high to low according to the code frequencies to obtain serial numbers of the sub-data sets, setting a compression ratio for each said sub-data set in a manner that the compression ratio is a reciprocal of the code frequency of the sub-data set, compressing the sub-data sets according to the compression ratios to obtain compressed sub-data sets and code lengths of the compressed sub-data sets, and then training the compressed sub-data sets in a manner that the sum of a compression loss and a quantification loss is minimum to obtain trained sub-data sets and hash codes of the trained sub-data sets; (3) extracting a random Fourier feature of each said trained sub-data set to obtain an original code corresponding to the trained sub-data set, and copying the hash code of each said trained sub-data set according to a code length of the corresponding original code and the compression ratio corresponding to the original code to obtain multiple replicas of the hash code of the trained sub-data set; (4) stringing the original code of each said trained sub-data set and the replicas of the hash code of the trained sub-data set to obtain a strung hash code corresponding to the trained sub-data set, and integrating the strung hash codes corresponding to all the trained sub-data sets to form a final nearest-neighbor query table; and (5) extracting a random Fourier feature vector of the given query point, mapping the random Fourier feature vector of the given query point to a random Fourier code having a length consistent with that of the strung hash code corresponding to one said trained sub-data set, and using the random Fourier code as a query code corresponding to the query point; and finally, searching the final nearest-neighbor query table for a nearest-neighbor data set having a minimum Hamming distance to the query code corresponding to the query point, and using the nearest-neighbor data set as a nearest-neighbor query result of the given query point, so that the nearest-neighbor query process of the given query point is completed.

Assignees

Inventors

Classifications

  • Hash tables · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • of query operations · CPC title

  • Fourier, Walsh or analogous domain transformations {, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms (for correlation function computation G06F17/156; spectrum analysers G01R23/16)} · CPC title

  • Multidimensional index structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020142897A1 cover?
A high-dimensional data nearest-neighbor query method based on variable-length hash codes is disclosed. Specifically, in this method, hash codes with the same code frequency are taken as a sub-data set, all the sub-data sets are ranked, a compression ratio is set for each sub-data set, the sub-data sets are compressed and trained according to the compression ratios, and hash codes and original …
Who is the assignee on this patent?
Univ Ningbo
What technology area does this patent fall under?
Primary CPC classification G06F16/2255. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).