Method and system for improved write performance in erasure-coded storage systems
US-11360699-B1 · Jun 14, 2022 · US
US11741060B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11741060-B2 |
| Application number | US-201916698288-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 27, 2019 |
| Priority date | Nov 27, 2019 |
| Publication date | Aug 29, 2023 |
| Grant date | Aug 29, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, computer program products, computer systems, and the like are disclosed that provide for scalable deduplication in an efficient and effective manner. For example, such methods, computer program products, and computer systems can include receiving a data object at an assigned node, determining whether the data object includes a sub-data object, and processing the sub-data object. The assigned node is a node of a plurality of nodes of a cluster, where the data object includes a data segment, and a signature. The signature is generated based, at least in part, on data of the data segment. The processing includes sending the sub-data object to a remote node. The remote node is another node of the plurality of nodes of the cluster.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving a data object from a client system at an assigned node, wherein the assigned node is a node of a plurality of nodes of a cluster, the data object is being backed up as part of a backup operation for the client system, the assigned node is assigned to the backup operation and stores a catalog for use in the backup operation, the data object comprises a data segment, and a signature, and the signature is generated based, at least in part, on data of the data segment; determining whether the data object comprises a sub-data object, wherein the determining uses the catalog; and in response to a determination that the data object comprises the sub-data object, processing the data object, wherein the backup operation comprises the determining and the processing, the assigned node performs the determining and the processing the data object, the data segment is stored in a first local deduplication pool at the assigned node, the signature is stored in a first local metadata store at the assigned node, and the processing the data object comprises determining a remote node at which the sub-data object is to be stored, generating a reference that identifies the sub-data object and the remote node, storing the reference as a stored reference in a catalog at the assigned node, wherein storage of the stored reference in the catalog facilitates access to the sub-data object at the remote node, and sending the sub-data object to the remote node, wherein the sending the sub-data object facilitates storage of a data segment of the sub-data object in a second local deduplication pool at the remote node, and a signature of the sub-data object in a second local metadata store at the remote node, and the remote node is another node of the plurality of nodes, other than the assigned node. 2. The method of claim 1 , wherein the data object comprises a container, the container comprises a container deduplicated data store and a container metadata store, the container deduplicated data store comprises one or more data segments comprising the data segment, and the container metadata store comprises metadata associated with the one or more data segments. 3. The method of claim 2 , wherein the metadata comprises the signature of the data segment and a location in the container deduplicated data store at which the data segment is stored. 4. The method of claim 3 , wherein the signature is a fingerprint, and the fingerprint was generated by performing a hash function on the data of the data segment. 5. The method of claim 1 , wherein the data object comprises a container, and the sending the sub-data object to the remote node comprises: sending the container to the remote node; and sending a container reference to the remote node, wherein the container reference comprises a container identifier that identifies the container. 6. The method of claim 5 , further comprising: receiving the sub-data object at the remote node; storing the container in a local deduplication pool at the remote node; and storing the container reference in a local reference database at the remote node. 7. The method of claim 6 , further comprising: receiving a request for a fingerprint list from a client system; retrieving the fingerprint list from a catalog; and sending the fingerprint list to the client system. 8. The method of claim 7 , further comprising: receiving a request for a location of the fingerprint list from the client system; determining the location; and sending the location to the client system. 9. The method of claim 7 , wherein the catalog is implemented as a single instance for the cluster. 10. The method of claim 1 , wherein the data object comprises a container and a container reference, and the method further comprises: storing the container in a local deduplication pool at the assigned node, wherein the container comprises a deduplicated data store, and a metadata store; and storing the container reference in a local reference database at the assigned node, wherein the container reference identifies the container. 11. The method of claim 1 , wherein the determining whether the sub-data object is to be stored at the assigned node is based, at least in part, on at least one of a computational resource of the assigned node, a storage resource of the assigned node, a network resource of the assigned node, or the sub-data object being a remote reference. 12. A non-transitory computer-readable storage medium, comprising program instructions, which, when executed by one or more processors of a computing system, perform a method comprising: receiving a data object from a client system at an assigned node, wherein the assigned node is a node of a plurality of nodes of a cluster, the data object is being backed up as part of a backup operation for the client system, the assigned node is assigned to the backup operation and stores a catalog for use in the backup operation, the data object comprises a data segment, and a signature, and the signature is generated based, at least in part, on data of the data segment; determining whether the data object comprises a sub-data object, wherein the determining uses the catalog; and in response to a determination that the data object comprises the sub-data object, processing the data object, wherein the backup operation comprises the determining and the processing, the assigned node performs the determining and the processing the data object, the data segment is stored in a first local deduplication pool at the assigned node, the signature is stored in a first local metadata store at the assigned node, and the processing the data object comprises determining a remote node at which the sub-data object is to be stored, generating a reference that identifies the sub-data object and the remote node, storing the reference as a stored reference in a catalog at the assigned node, wherein storage of the stored reference in the catalog facilitates access to the sub-data object at the remote node, and sending the sub-data object to the remote node, wherein the sending the sub-data object facilitates storage of a data segment of the sub-data object in a second local deduplication pool at the remote node, and a signature of the sub-data object in a second local metadata store at the remote node, and the remote node is another node of the plurality of nodes, other than the assigned node. 13. The non-transitory computer-readable storage medium of claim 12 , wherein the data object comprises a container, the container comprises a container deduplicated data store and a container metadata store, the container deduplicated data store comprises the data segment, the signature is a fingerprint, the container metadata store comprises metadata comprising the fingerprint and a location of the data segment in the container deduplicated data store, and the fingerprint was generated by performing a hash function on the data of the data segment. 14. The non-transitory computer-readable storage medium of claim 12 , wherein the catalog is implemented as a single instance for the cluster. 15. The non-transitory computer-readable storage medium of claim 12 , wherein the data object comprises a container, and the sending the sub-data object to the remote node comprises: sending the container to the remote node; and sending a container reference to the remote node, wherein the container reference comprises a con
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.