Deduplicating files across multiple storage tiers in a clustered file system network

US12554590B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12554590-B2
Application numberUS-202318304359-A
CountryUS
Kind codeB2
Filing dateApr 21, 2023
Priority dateApr 21, 2023
Publication dateFeb 17, 2026
Grant dateFeb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are described for a system and method facilitating deduplication in a multi-tier storage system in which a file can have different portions written to different tiers. A process partition the data space of each tier to a number of similarity groups and distributes the similarity groups across file system services in a cluster. The distribution is done in such a way that for a given similarity group ID, the same file system service owns the similarity groups of every tier. This allows for efficient checks for deduplication as it can be done local to a node rather than requiring remote procedure calls.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for deduplicating data of data assets in a multi-tier network having a plurality of different storage devices in which a file can have different portions written to different tiers, comprising: organizing the storage devices into a plurality of storage tiers based on respective operating characteristics; mapping each storage tier to a respective Service Level Agreements (SLA) dictating storage requirements for each of the data assets to a backup program; mapping each SLA to one or more tiers of the plurality of tiers based on the storage requirements of a respective SLA to the operating characteristics of each tier; partitioning a data space of each tier into a plurality of similarity groups (simgroup), each simgroup having a unique assigned simgroup identifier (ID); mapping data of the data assets to the similarity groups based on organization of the data among the tiers; distributing the similarity groups across file system services of deduplication nodes in the network so that each file system service owns a same similarity group for all the tiers; forming each similarity group by applying a mapping function comprising a hash of fingerprint data for first level data segments (L1) of a Merkle tree organizing the data; assigning each file system service to one or more simgroups through a range of simgroup IDs; checking, as a local operation performed at each node, data deduplication using a portion of a fingerprint index associated with a respective range of simgroup IDs by checking each similarity group against a mapping table routing a corresponding L1 data segment to a corresponding instance of a deduplication service of the each node; and performing deduplication of the data of a similarity group during a backup on a respective deduplication node. 2 . The method of claim 1 wherein the storage requirements comprise backup and restore latencies, media availability, and cost, and further wherein the operating characteristics comprise throughput (input/output rate), latency, security, and availability. 3 . The method of claim 2 further comprising characterizing the tiers along a performance scale ranging from high performance to low performance for throughput versus cost of storage, and wherein the data assets comprise at least one of files, directories, Mtrees, and namespaces. 4 . The method of claim 3 wherein the tiers comprise one or more types of storage media selected from: hard disk drives (HDDs), solid state drives (SSDs), flash memory, and cloud storage, and wherein the performance, availability and cost characteristics are different for each type of storage media. 5 . The method of claim 4 wherein the backup software comprises part of a deduplication backup system performing backup and restore operations for nodes of the multi-tier network, and wherein the network comprises a Santorini clustered network. 6 . The method of claim 5 wherein the deduplication node executes deduplication and compression services that pack unique data segments, and writes data segments as an object in an object store. 7 . The method of claim 6 wherein an SLA attribute is used with the similarity group to send the file to the appropriate tier in the network to meet the SLA. 8 . The method of claim 1 wherein the data assets comprise at least one of files, directories, Mtrees, and namespaces, and further wherein the backup software comprises part of a deduplication backup system performing backup and restore operations for nodes of the multi-tier network. 9 . The method of claim 1 wherein each similarity group is calculated for an L1 data segment based on SHA1 fingerprints of L0 segments of the Merkle tree, and wherein the deduplication service deduplicates the L0 segments relative to other fingerprints within a same similarity group.

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • for networked environments · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12554590B2 cover?
Embodiments are described for a system and method facilitating deduplication in a multi-tier storage system in which a file can have different portions written to different tiers. A process partition the data space of each tier to a number of similarity groups and distributes the similarity groups across file system services in a cluster. The distribution is done in such a way that for a given …
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).