Hash-Based Forwarding In Content Centric Networks
US-2016094553-A1 · Mar 31, 2016 · US
US10990626B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10990626-B2 |
| Application number | US-201615754904-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 23, 2016 |
| Priority date | Sep 24, 2015 |
| Publication date | Apr 27, 2021 |
| Grant date | Apr 27, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data storage and retrieval system employs online supervised hashing for indexing a data set and retrieving data items therefrom. A hash-based mapping is used to generate hash codes for indexing content items. Data items may be retrieved based on either/both a query label (using corresponding codewords) and the content item itself (using the hash codes). The hash-based mapping is updated using an objective function of distance between the hash codes and respective codewords for labels of labelled content items, preserving semantic similarities of content items. The codewords may be error-correcting codes. Techniques for efficiently updating the index include (1) cycle-based updating and ternary codewords, and (2) reservoir sample-based method of determining when to trigger an update.
Opening claim text (preview).
What is claimed is: 1. A data storage and retrieval system, comprising: a storage subsystem for storing a dataset of content items, at least some of the content items of the dataset being labelled content items having respective labels describing respective aspects of the content items; and a processing subsystem coupled to the storage subsystem and having a communications interface coupled to receive additions to the dataset and queries of the dataset, the processing subsystem being configured and operative to: maintain an index of the content items using hash codes and predetermined codewords as index values, the hash codes being dynamically generated by an online-learned hash-based mapping, the codewords representing respective labels of the labelled content items of the dataset, the codewords being error-correcting codes protecting against errors in the hash-based mapping and promoting selectivity of the hash codes produced by the hash-based mapping; maintain and dynamically update the hash-based mapping, and use the hash-based mapping to generate the hash codes as content items of the dataset are indexed, the hash-based mapping preserving semantic similarities of content items, the hash-based mapping being updated using an objective function of distance between the hash codes produced by the hash-based mapping and respective codewords for the labelled content items; and search the index to retrieve stored content items in response to the queries received via the communications interface, including (i) using the hash-based mapping to calculate a query hash code from a content item of the query, and (ii) using the query hash code to retrieve stored content items indexed by similar codewords or hash codes; wherein the processing subsystem performs a randomized reservoir sampling method that (i) maintains a reservoir sampling set of randomly selected content items and associated labels as they are added to the dataset during operation, (ii) uses the reservoir sampling set to determine when to update the index, and (iii) updates the index at determined times. 2. The data storage and retrieval system of claim 1 , wherein to maintain the hash-based mapping the processing subsystem performs a calculation minimizing an upper bound on a Hamming loss between the hash codes and the codewords. 3. The data storage and retrieval system of claim 1 , wherein to determine when to update the index, the processing subsystem performs a calculation comparing a measure of an amount of change of the hash codes of the reservoir to a predetermined threshold value. 4. The data storage and retrieval system of claim 3 , wherein the threshold is a dynamic threshold adapted based on the number of labels observed in preceding operation. 5. The data storage and retrieval system of claim 1 , wherein to maintain the hash-based mapping the processing subsystem evaluates a regularizer term in addition to the objective function of distance, the regularizer term encoding a measure of divergence between a current hash-based mapping and a candidate new hash-based mapping, to promote selection of a new hash-based mapping requiring less updating of the index than other candidate new hash-based mappings. 6. The data storage and retrieval system of claim 5 , wherein the processing subsystem uses the reservoir sampling set to evaluate the regularizer term. 7. The data storage and retrieval system of claim 6 , wherein the processing subsystem also uses the reservoir sampling set to determine when to update the index, and updates the index at determined times. 8. The data storage and retrieval system of claim 7 , wherein to determine when to update the index, the processing subsystem performs a calculation comparing a measure of an amount of change of the hash codes of the reservoir to a predetermined threshold value. 9. The data storage and retrieval system of claim 8 , wherein the threshold is a dynamic threshold adapted based on the number of labels observed in preceding operation. 10. The data storage and retrieval system of claim 1 , wherein to maintain the index and maintain the hash-based mapping, the processing subsystem performs ongoing processes divided into cycles during each of which a number of new labels are learned and corresponding content items are stored, both the hash-based mapping and the index being divided into sections corresponding to the cycles during which respective hash functions and codewords of the sections were first created and stored therein, the processes including, at a time of updating the hash-based mapping and the index based on adding a new labelled content item to the dataset: identifying respective sections of the hash-based mapping and index structure corresponding to a codeword for a label of the labelled content item; and updating the identified sections. 11. The data storage and retrieval system of claim 10 , wherein the codewords comprise ternary symbols including an inactive value, and wherein, to update the identified sections, the processing subsystem also updates other sections of the index structure by appending a series of symbols having the inactive value to existing symbols of existing hash codes. 12. A method of operating a data storage and retrieval system storing a dataset of content items, at least some of the content items of the dataset being labelled content items having respective labels describing respective aspects of the content items, the data storage and retrieval system having a communications interface for receiving additions to the dataset and queries of the dataset, comprising: maintaining an index of the content items using hash codes and predetermined codewords as index values, the hash codes being dynamically generated by an online-learned hash-based mapping, the codewords representing respective labels of the labelled content items of the dataset, the codewords being error-correcting codes protecting against errors in the hash-based mapping and promoting selectivity of the hash codes produced by the hash-based mapping; maintaining and dynamically updating the hash-based mapping, and using the hash-based mapping to generate the hash codes as content items of the dataset are indexed, the hash-based mapping preserving semantic similarities of content items, the hash-based mapping being updated using an objective function of distance between the hash codes produced by the hash-based mapping and respective codewords for the labelled content items; searching the index to retrieve stored content items in response to the queries received via the communications interface, including (i) using the hash-based mapping to calculate a query hash code from a content item of the query, and (ii) using the query hash code to retrieve stored content items indexed by similar codewords or hash codes; and randomized reservoir sampling that includes (i) maintains a reservoir sampling set of randomly s elected content items and associated labels as they are added to the dataset during operation, (ii) uses the reservoir sampling set to determine when to update the index, and (iii) updates the index at determined times. 13. The method of claim 12 , wherein maintaining the hash-based mapping includes minimizing an upper bound on a Hamming loss between the hash codes and the codewords. 14. The method of claim 12 , further including, to determine when to update the index, performing a calculation comparing a measure of an amount of change of the hash codes of the reservoir to a predetermined threshold value. 15. The method of claim 14 , wherein the threshold is a dynamic threshold adapted based on the number of labe
Error detection codes other than CRC and single parity bit codes · CPC title
Protection against loss of memory contents {(contains no material, see G06F11/00)} · CPC title
Methods or arrangements for processing data by operating upon the order or content of the data handled (logic circuits H03K19/00) · CPC title
Query processing · CPC title
hash tables · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.