What technology area does this patent fall under?

Primary CPC classification H03M7/3091. Mapped technology areas include Electricity.

When was this patent published?

Publication date Thu Dec 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Maintaining data deduplication reference information

US2017351697A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2017351697-A1
Application number	US-201615173323-A
Country	US
Kind code	A1
Filing date	Jun 3, 2016
Priority date	Jun 3, 2016
Publication date	Dec 7, 2017
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data deduplication method includes detecting a deduplication transaction including a data pattern associated with a data pattern address (DPA) and a reference, to the pattern, associated with a data reference address (DRA). A deduplication key may be determined based on the DPA and the DRA by concatenating the DPA and the DRA with the DPA as the most significant bits. The key may be stored in a key field of a record in a persistent and sequentially-accessed log, which is part of a log-with-index (LWI) structure that also maintains, in RAM or SSD, a binary index of the log records. When full, the log is cleared by writing the records in key-sorted order to the new tablet. From time to time, two tablets in the tablet library are merged. Tablet merging may include two or more atomic merges, each atomic merge corresponding to a portion of the tablet.

First claim

Opening claim text (preview).

What is claimed is: 1 . A data deduplication method, comprising: detecting a deduplication transaction comprising a data reference at a data reference address and a data pattern at a data pattern address; determining a data deduplication key based on the data reference address and the data pattern address; storing the data deduplication key in a key field of a record in a log; maintaining an index of the records in the log; responsive to detecting a log full signal, performing log clear operations comprising: creating, in a tablet library comprising at least one other tablet, a new tablet; storing the records, sorted in accordance with the data deduplication keys, to the new tablet; and clearing the log of all entries; and responsive to a tablet merge signal, merging a first tablet of the tablet library and a second tablet of the tablet library to form a merged tablet and releasing the first tablet and the second tablet from the tablet library. 2 . The data deduplication method of claim 1 , wherein determining the data deduplication key includes appending at least a portion of the data reference address to at least a portion of the data pattern address with the portion of the data pattern address comprising the most significant bits. 3 . The data deduplication method of claim 1 , wherein the records in the log either: do not include a value field corresponding to the key field; or include a null value in the value field. 4 . The data deduplication method of claim 1 , wherein each record includes: a presence bit for distinguishing between insertion transactions and deletion transactions; and a sequence field storing a sequence value common to each record in the log; and wherein clearing the log includes incrementing the sequence number. 5 . The data deduplication method of claim 1 , further comprising: maintaining the log in persistent storage; and maintaining the index in memory; and inserting records comprises inserting records sequentially in the next sequential record of the log. 6 . The data deduplication method of claim 1 , wherein merging the first tablet and the second tablet comprises iteratively performing a plurality of atomic merges for each of a plurality of atomic portions of the first and second tablets, each atomic merge comprising: merging an atomic portion of the first tablet with an atomic portion of the second tablet to form an atomic portion of the merged tablet; and updating tablet index nodes corresponding to the atomic portion. 7 . The data deduplication method of claim 6 , wherein boundaries of the atomic portion are defined in terms of either: a particular range of the keys; or a particular number of fixed size tablet pages. 8 . The data deduplication method of claim 6 , further comprising: maintaining the tablet index as copy-on-write data wherein said updating of the tablet index nodes preserves node data until the atomic merge is committed to the merged tablet and the tablet portions of the first and second tablets are released. 9 . The data deduplication method of claim 8 , wherein the tablet index includes a super root node comprising a parent node of root nodes for the first, second, and merged tablets, wherein said updating of the nodes preserves node data until the atomic merge is committed to the merged tablet and the tablet portions of the first and second tablets are released. 10 . The data deduplication method of claim 1 , wherein the log full signal is asserted responsive to utilization of the log exceeding a threshold selected from: a percentage utilization threshold; a record count threshold; and a byte size threshold. 11 . The data deduplication method of claim 1 , further comprising: responsive to receiving a range query indicating a range of keys, generating a query result indicative of the range indicated in the query. 12 . The data deduplication method of claim 1 , further comprising, responsive to receiving a summary query indicating a range of keys, a key mask, and a maximum count, returning a result indicating a number of key values within the range of keys subject to the key mask and the maximum count. 13 . An information handling system, comprising: a processor; a memory, accessible to the processor for performing operations comprising: detecting a deduplication transaction comprising a data reference at a data reference address and a data pattern at a data pattern address; determining a data deduplication key based on the data reference address and the data pattern address; storing the data deduplication key in a key field of a record in a log; responsive to detecting a log full signal, performing log clear operations comprising: creating, in a tablet library comprising at least one other tablet, a new tablet; storing the records, sorted in accordance with the data deduplication keys, to the new tablet; and clearing the log of all entries; and responsive to a tablet merge signal, merging a first tablet of the tablet library and a second tablet of the tablet library to form a merged tablet and releasing the first tablet and the second tablet from the tablet library. 14 . The information handling system of claim 13 , further comprising: maintaining a binary tree index of the records in the log, wherein the storing of the records to the new tablet comprises storing the records sorted in accordance with binary tree index. 15 . The information handling system of claim 13 , wherein determining the data deduplication key includes appending at least a portion of the data reference address to at least a portion of the data pattern address with the portion of the data pattern address comprising the most significant bits. 16 . The information handling system of claim 13 , wherein the records in the log either: do not include a value field corresponding to the key field; or include a null value in the value field. 17 . The information handling system of claim 13 , wherein each record includes: a presence bit for distinguishing between insertion transactions and deletion transactions; and a sequence field storing a sequence value common to each record in the log; and wherein clearing the log includes incrementing the sequence number. 18 . The information handling system of claim 13 , wherein merging the first tablet and the second tablet comprises iteratively performing a plurality of atomic merges for each of a plurality of tablet portions, each atomic merge comprising: merging an atomic portion of the first tablet with an atomic portion of the second tablet to form an atomic portion of the merged tablet; and updating a portion of the tablet index corresponding to the atomic portion; wherein the atomic portion comprises a tablet portion corresponding to a particular range of the keys 19 . The information handling system of claim 13 , wherein the operations include: responsive to receiving a range query indicating a range of keys, retrieving all keys within the range indicated in the query. 20 . The information handling system of claim 13 , wherein the operations include: responsive to receiving a summary query indicating a range of keys, a key mask, and a maximum count, returning a result indicating a number of key values within the range of keys subject to the key mask and the maximum count.

Assignees

Dell Products Lp

Inventors

Brosch Ryan W

Classifications

G06F3/06
Digital input from, or digital output to, record carriers {, e.g. RAID, emulated record carriers or networked record carriers} · CPC title
H03M7/3091Primary
Data deduplication · CPC title
G06F16/2272
Management thereof · CPC title
G06F3/0683
Plurality of storage devices · CPC title
G06F3/0641
De-duplication techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 60483823

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017351697A1 cover?: A data deduplication method includes detecting a deduplication transaction including a data pattern associated with a data pattern address (DPA) and a reference, to the pattern, associated with a data reference address (DRA). A deduplication key may be determined based on the DPA and the DRA by concatenating the DPA and the DRA with the DPA as the most significant bits. The key may be stored in…
Who is the assignee on this patent?: Dell Products Lp
What technology area does this patent fall under?: Primary CPC classification H03M7/3091. Mapped technology areas include Electricity.
When was this patent published?: Publication date Thu Dec 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).