Handling data with different lifetime characteristics in stream-aware data storage equipment
US-2023176743-A1 · Jun 8, 2023 · US
US12554590B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12554590-B2 |
| Application number | US-202318304359-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 21, 2023 |
| Priority date | Apr 21, 2023 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are described for a system and method facilitating deduplication in a multi-tier storage system in which a file can have different portions written to different tiers. A process partition the data space of each tier to a number of similarity groups and distributes the similarity groups across file system services in a cluster. The distribution is done in such a way that for a given similarity group ID, the same file system service owns the similarity groups of every tier. This allows for efficient checks for deduplication as it can be done local to a node rather than requiring remote procedure calls.
Opening claim text (preview).
What is claimed is: 1 . A method for deduplicating data of data assets in a multi-tier network having a plurality of different storage devices in which a file can have different portions written to different tiers, comprising: organizing the storage devices into a plurality of storage tiers based on respective operating characteristics; mapping each storage tier to a respective Service Level Agreements (SLA) dictating storage requirements for each of the data assets to a backup program; mapping each SLA to one or more tiers of the plurality of tiers based on the storage requirements of a respective SLA to the operating characteristics of each tier; partitioning a data space of each tier into a plurality of similarity groups (simgroup), each simgroup having a unique assigned simgroup identifier (ID); mapping data of the data assets to the similarity groups based on organization of the data among the tiers; distributing the similarity groups across file system services of deduplication nodes in the network so that each file system service owns a same similarity group for all the tiers; forming each similarity group by applying a mapping function comprising a hash of fingerprint data for first level data segments (L1) of a Merkle tree organizing the data; assigning each file system service to one or more simgroups through a range of simgroup IDs; checking, as a local operation performed at each node, data deduplication using a portion of a fingerprint index associated with a respective range of simgroup IDs by checking each similarity group against a mapping table routing a corresponding L1 data segment to a corresponding instance of a deduplication service of the each node; and performing deduplication of the data of a similarity group during a backup on a respective deduplication node. 2 . The method of claim 1 wherein the storage requirements comprise backup and restore latencies, media availability, and cost, and further wherein the operating characteristics comprise throughput (input/output rate), latency, security, and availability. 3 . The method of claim 2 further comprising characterizing the tiers along a performance scale ranging from high performance to low performance for throughput versus cost of storage, and wherein the data assets comprise at least one of files, directories, Mtrees, and namespaces. 4 . The method of claim 3 wherein the tiers comprise one or more types of storage media selected from: hard disk drives (HDDs), solid state drives (SSDs), flash memory, and cloud storage, and wherein the performance, availability and cost characteristics are different for each type of storage media. 5 . The method of claim 4 wherein the backup software comprises part of a deduplication backup system performing backup and restore operations for nodes of the multi-tier network, and wherein the network comprises a Santorini clustered network. 6 . The method of claim 5 wherein the deduplication node executes deduplication and compression services that pack unique data segments, and writes data segments as an object in an object store. 7 . The method of claim 6 wherein an SLA attribute is used with the similarity group to send the file to the appropriate tier in the network to meet the SLA. 8 . The method of claim 1 wherein the data assets comprise at least one of files, directories, Mtrees, and namespaces, and further wherein the backup software comprises part of a deduplication backup system performing backup and restore operations for nodes of the multi-tier network. 9 . The method of claim 1 wherein each similarity group is calculated for an L1 data segment based on SHA1 fingerprints of L0 segments of the Merkle tree, and wherein the deduplication service deduplicates the L0 segments relative to other fingerprints within a same similarity group.
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
for networked environments · CPC title
Using snapshots, i.e. a logical point-in-time copy of the data · CPC title
using de-duplication of the data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.