Methods and systems for scalable deduplication

US11741060B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11741060-B2
Application numberUS-201916698288-A
CountryUS
Kind codeB2
Filing dateNov 27, 2019
Priority dateNov 27, 2019
Publication dateAug 29, 2023
Grant dateAug 29, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, computer program products, computer systems, and the like are disclosed that provide for scalable deduplication in an efficient and effective manner. For example, such methods, computer program products, and computer systems can include receiving a data object at an assigned node, determining whether the data object includes a sub-data object, and processing the sub-data object. The assigned node is a node of a plurality of nodes of a cluster, where the data object includes a data segment, and a signature. The signature is generated based, at least in part, on data of the data segment. The processing includes sending the sub-data object to a remote node. The remote node is another node of the plurality of nodes of the cluster.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a data object from a client system at an assigned node, wherein the assigned node is a node of a plurality of nodes of a cluster, the data object is being backed up as part of a backup operation for the client system, the assigned node is assigned to the backup operation and stores a catalog for use in the backup operation, the data object comprises a data segment, and a signature, and the signature is generated based, at least in part, on data of the data segment; determining whether the data object comprises a sub-data object, wherein the determining uses the catalog; and in response to a determination that the data object comprises the sub-data object, processing the data object, wherein the backup operation comprises the determining and the processing, the assigned node performs the determining and the processing the data object, the data segment is stored in a first local deduplication pool at the assigned node, the signature is stored in a first local metadata store at the assigned node, and the processing the data object comprises determining a remote node at which the sub-data object is to be stored, generating a reference that identifies the sub-data object and the remote node, storing the reference as a stored reference in a catalog at the assigned node, wherein storage of the stored reference in the catalog facilitates access to the sub-data object at the remote node, and sending the sub-data object to the remote node, wherein the sending the sub-data object facilitates storage of  a data segment of the sub-data object in a second local deduplication pool at the remote node, and  a signature of the sub-data object in a second local metadata store at the remote node, and the remote node is another node of the plurality of nodes, other than the assigned node. 2. The method of claim 1 , wherein the data object comprises a container, the container comprises a container deduplicated data store and a container metadata store, the container deduplicated data store comprises one or more data segments comprising the data segment, and the container metadata store comprises metadata associated with the one or more data segments. 3. The method of claim 2 , wherein the metadata comprises the signature of the data segment and a location in the container deduplicated data store at which the data segment is stored. 4. The method of claim 3 , wherein the signature is a fingerprint, and the fingerprint was generated by performing a hash function on the data of the data segment. 5. The method of claim 1 , wherein the data object comprises a container, and the sending the sub-data object to the remote node comprises: sending the container to the remote node; and sending a container reference to the remote node, wherein the container reference comprises a container identifier that identifies the container. 6. The method of claim 5 , further comprising: receiving the sub-data object at the remote node; storing the container in a local deduplication pool at the remote node; and storing the container reference in a local reference database at the remote node. 7. The method of claim 6 , further comprising: receiving a request for a fingerprint list from a client system; retrieving the fingerprint list from a catalog; and sending the fingerprint list to the client system. 8. The method of claim 7 , further comprising: receiving a request for a location of the fingerprint list from the client system; determining the location; and sending the location to the client system. 9. The method of claim 7 , wherein the catalog is implemented as a single instance for the cluster. 10. The method of claim 1 , wherein the data object comprises a container and a container reference, and the method further comprises: storing the container in a local deduplication pool at the assigned node, wherein the container comprises a deduplicated data store, and a metadata store; and storing the container reference in a local reference database at the assigned node, wherein the container reference identifies the container. 11. The method of claim 1 , wherein the determining whether the sub-data object is to be stored at the assigned node is based, at least in part, on at least one of a computational resource of the assigned node, a storage resource of the assigned node, a network resource of the assigned node, or the sub-data object being a remote reference. 12. A non-transitory computer-readable storage medium, comprising program instructions, which, when executed by one or more processors of a computing system, perform a method comprising: receiving a data object from a client system at an assigned node, wherein the assigned node is a node of a plurality of nodes of a cluster, the data object is being backed up as part of a backup operation for the client system, the assigned node is assigned to the backup operation and stores a catalog for use in the backup operation, the data object comprises a data segment, and a signature, and the signature is generated based, at least in part, on data of the data segment; determining whether the data object comprises a sub-data object, wherein the determining uses the catalog; and in response to a determination that the data object comprises the sub-data object, processing the data object, wherein the backup operation comprises the determining and the processing, the assigned node performs the determining and the processing the data object, the data segment is stored in a first local deduplication pool at the assigned node, the signature is stored in a first local metadata store at the assigned node, and the processing the data object comprises determining a remote node at which the sub-data object is to be stored, generating a reference that identifies the sub-data object and the remote node, storing the reference as a stored reference in a catalog at the assigned node, wherein storage of the stored reference in the catalog facilitates access to the sub-data object at the remote node, and sending the sub-data object to the remote node, wherein the sending the sub-data object facilitates storage of  a data segment of the sub-data object in a second local deduplication pool at the remote node, and  a signature of the sub-data object in a second local metadata store at the remote node, and the remote node is another node of the plurality of nodes, other than the assigned node. 13. The non-transitory computer-readable storage medium of claim 12 , wherein the data object comprises a container, the container comprises a container deduplicated data store and a container metadata store, the container deduplicated data store comprises the data segment, the signature is a fingerprint, the container metadata store comprises metadata comprising the fingerprint and a location of the data segment in the container deduplicated data store, and the fingerprint was generated by performing a hash function on the data of the data segment. 14. The non-transitory computer-readable storage medium of claim 12 , wherein the catalog is implemented as a single instance for the cluster. 15. The non-transitory computer-readable storage medium of claim 12 , wherein the data object comprises a container, and the sending the sub-data object to the remote node comprises: sending the container to the remote node; and sending a container reference to the remote node, wherein the container reference comprises a con

Assignees

Inventors

Classifications

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11741060B2 cover?
Methods, computer program products, computer systems, and the like are disclosed that provide for scalable deduplication in an efficient and effective manner. For example, such methods, computer program products, and computer systems can include receiving a data object at an assigned node, determining whether the data object includes a sub-data object, and processing the sub-data object. The as…
Who is the assignee on this patent?
Veritas Technologies Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 29 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).