Systems, methods and interfaces for data virtualization
US-10102144-B2 · Oct 16, 2018 · US
US2017351697A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017351697-A1 |
| Application number | US-201615173323-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 3, 2016 |
| Priority date | Jun 3, 2016 |
| Publication date | Dec 7, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data deduplication method includes detecting a deduplication transaction including a data pattern associated with a data pattern address (DPA) and a reference, to the pattern, associated with a data reference address (DRA). A deduplication key may be determined based on the DPA and the DRA by concatenating the DPA and the DRA with the DPA as the most significant bits. The key may be stored in a key field of a record in a persistent and sequentially-accessed log, which is part of a log-with-index (LWI) structure that also maintains, in RAM or SSD, a binary index of the log records. When full, the log is cleared by writing the records in key-sorted order to the new tablet. From time to time, two tablets in the tablet library are merged. Tablet merging may include two or more atomic merges, each atomic merge corresponding to a portion of the tablet.
Opening claim text (preview).
What is claimed is: 1 . A data deduplication method, comprising: detecting a deduplication transaction comprising a data reference at a data reference address and a data pattern at a data pattern address; determining a data deduplication key based on the data reference address and the data pattern address; storing the data deduplication key in a key field of a record in a log; maintaining an index of the records in the log; responsive to detecting a log full signal, performing log clear operations comprising: creating, in a tablet library comprising at least one other tablet, a new tablet; storing the records, sorted in accordance with the data deduplication keys, to the new tablet; and clearing the log of all entries; and responsive to a tablet merge signal, merging a first tablet of the tablet library and a second tablet of the tablet library to form a merged tablet and releasing the first tablet and the second tablet from the tablet library. 2 . The data deduplication method of claim 1 , wherein determining the data deduplication key includes appending at least a portion of the data reference address to at least a portion of the data pattern address with the portion of the data pattern address comprising the most significant bits. 3 . The data deduplication method of claim 1 , wherein the records in the log either: do not include a value field corresponding to the key field; or include a null value in the value field. 4 . The data deduplication method of claim 1 , wherein each record includes: a presence bit for distinguishing between insertion transactions and deletion transactions; and a sequence field storing a sequence value common to each record in the log; and wherein clearing the log includes incrementing the sequence number. 5 . The data deduplication method of claim 1 , further comprising: maintaining the log in persistent storage; and maintaining the index in memory; and inserting records comprises inserting records sequentially in the next sequential record of the log. 6 . The data deduplication method of claim 1 , wherein merging the first tablet and the second tablet comprises iteratively performing a plurality of atomic merges for each of a plurality of atomic portions of the first and second tablets, each atomic merge comprising: merging an atomic portion of the first tablet with an atomic portion of the second tablet to form an atomic portion of the merged tablet; and updating tablet index nodes corresponding to the atomic portion. 7 . The data deduplication method of claim 6 , wherein boundaries of the atomic portion are defined in terms of either: a particular range of the keys; or a particular number of fixed size tablet pages. 8 . The data deduplication method of claim 6 , further comprising: maintaining the tablet index as copy-on-write data wherein said updating of the tablet index nodes preserves node data until the atomic merge is committed to the merged tablet and the tablet portions of the first and second tablets are released. 9 . The data deduplication method of claim 8 , wherein the tablet index includes a super root node comprising a parent node of root nodes for the first, second, and merged tablets, wherein said updating of the nodes preserves node data until the atomic merge is committed to the merged tablet and the tablet portions of the first and second tablets are released. 10 . The data deduplication method of claim 1 , wherein the log full signal is asserted responsive to utilization of the log exceeding a threshold selected from: a percentage utilization threshold; a record count threshold; and a byte size threshold. 11 . The data deduplication method of claim 1 , further comprising: responsive to receiving a range query indicating a range of keys, generating a query result indicative of the range indicated in the query. 12 . The data deduplication method of claim 1 , further comprising, responsive to receiving a summary query indicating a range of keys, a key mask, and a maximum count, returning a result indicating a number of key values within the range of keys subject to the key mask and the maximum count. 13 . An information handling system, comprising: a processor; a memory, accessible to the processor for performing operations comprising: detecting a deduplication transaction comprising a data reference at a data reference address and a data pattern at a data pattern address; determining a data deduplication key based on the data reference address and the data pattern address; storing the data deduplication key in a key field of a record in a log; responsive to detecting a log full signal, performing log clear operations comprising: creating, in a tablet library comprising at least one other tablet, a new tablet; storing the records, sorted in accordance with the data deduplication keys, to the new tablet; and clearing the log of all entries; and responsive to a tablet merge signal, merging a first tablet of the tablet library and a second tablet of the tablet library to form a merged tablet and releasing the first tablet and the second tablet from the tablet library. 14 . The information handling system of claim 13 , further comprising: maintaining a binary tree index of the records in the log, wherein the storing of the records to the new tablet comprises storing the records sorted in accordance with binary tree index. 15 . The information handling system of claim 13 , wherein determining the data deduplication key includes appending at least a portion of the data reference address to at least a portion of the data pattern address with the portion of the data pattern address comprising the most significant bits. 16 . The information handling system of claim 13 , wherein the records in the log either: do not include a value field corresponding to the key field; or include a null value in the value field. 17 . The information handling system of claim 13 , wherein each record includes: a presence bit for distinguishing between insertion transactions and deletion transactions; and a sequence field storing a sequence value common to each record in the log; and wherein clearing the log includes incrementing the sequence number. 18 . The information handling system of claim 13 , wherein merging the first tablet and the second tablet comprises iteratively performing a plurality of atomic merges for each of a plurality of tablet portions, each atomic merge comprising: merging an atomic portion of the first tablet with an atomic portion of the second tablet to form an atomic portion of the merged tablet; and updating a portion of the tablet index corresponding to the atomic portion; wherein the atomic portion comprises a tablet portion corresponding to a particular range of the keys 19 . The information handling system of claim 13 , wherein the operations include: responsive to receiving a range query indicating a range of keys, retrieving all keys within the range indicated in the query. 20 . The information handling system of claim 13 , wherein the operations include: responsive to receiving a summary query indicating a range of keys, a key mask, and a maximum count, returning a result indicating a number of key values within the range of keys subject to the key mask and the maximum count.
Digital input from, or digital output to, record carriers {, e.g. RAID, emulated record carriers or networked record carriers} · CPC title
Data deduplication · CPC title
Management thereof · CPC title
Plurality of storage devices · CPC title
De-duplication techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.