Data deduplication within distributed computing components

US10747734B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10747734-B2
Application numberUS-201615189232-A
CountryUS
Kind codeB2
Filing dateJun 22, 2016
Priority dateJun 22, 2016
Publication dateAug 18, 2020
Grant dateAug 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments for, in an object storage environment, deduplicating data within and between distributed computing components by a processor. A deduplication operation is paired with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to a local node.

First claim

Opening claim text (preview).

The invention claimed is: 1. In an object storage environment, a method for deduplicating data within and between distributed computing components by a processor, comprising: pairing a deduplication operation with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to a local node such that prior to transferring any portions of the data object from the object storage environment to the local node, the metadata associated with the data object is examined to determine those portions of the data object not required to be transferred according to the determination of the data necessitating deduplication; and initiating a seed file incorporating the data object metadata, the seed file having a plurality of table structures each containing descriptive fields pertaining to locations of each data object stored in the object storage environment; wherein a local module on the local node accesses the seed file for reference in identifying deduplicable data. 2. The method of claim 1 , further including performing the pairing in a remote module executing on an object storage system within the distributed computing components. 3. The method of claim 1 , further including transferring, from the object storage system within the distributed computing components to the local node, only those data objects identified as nonduplicate data by reference to the seed file. 4. The method of claim 3 , further including initiating a first table in the seed file incorporating a calculated hash value of each data object already existing in the object storage environment. 5. The method of claim 4 , further including initiating a second table in the seed file, which, when requested by the remote module, incorporates a mapping between a local data object identifier corresponding to a data object requested by a user and a remote data object identifier corresponding to a data object maintained by the object storage system. 6. The method of claim 5 , further including generating, by the remote module, a temporary data object representing offsets corresponding to the data object maintained by the object storage system to be transferred to the user. 7. In an object storage environment, a system for deduplicating data within and between distributed computing components, comprising: a processor, integrated into one of the distributed computing components; and an additional processor associated with a local module on a local node, wherein the processor: pairs a deduplication operation with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to the local node such that prior to transferring any portions of the data object from the object storage environment to the local node, the metadata associated with the data object is examined to determine those portions of the data object not required to be transferred according to the determination of the data necessitating deduplication; and initiates a seed file incorporating the data object metadata, the seed file having a plurality of table structures each containing descriptive fields pertaining to locations of each data object stored in the object storage environment; wherein the additional processor associated with the local module on the local node accesses the seed file for reference in identifying deduplicable data. 8. The system of claim 7 , wherein the processor pairs the deduplication operation in a remote module executing on an object storage system within the distributed computing components. 9. The system of claim 7 , wherein the processor transfers, from the object storage system within the distributed computing components to the local node, only those data objects identified as nonduplicate data by reference to the seed file. 10. The system of claim 9 , wherein the processor initiates a first table in the seed file incorporating a calculated hash value of each data object already existing in the object storage environment. 11. The system of claim 10 , wherein the processor initiates a second table in the seed file, which, when requested by the remote module, incorporates a mapping between a local data object identifier corresponding to a data object requested by a user and a remote data object identifier corresponding to a data object maintained by the object storage system. 12. The system of claim 11 , wherein the processor generates, by the remote module, a temporary data object representing offsets corresponding to the data object maintained by the object storage system to be transferred to the user. 13. In an object storage environment, a computer program product for deduplicating data within and between distributed computing components by a processor, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that pairs a deduplication operation with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to a local node such that prior to transferring any portions of the data object from the object storage environment to the local node, the metadata associated with the data object is examined to determine those portions of the data object not required to be transferred according to the determination of the data necessitating deduplication; and an executable portion that initiates a seed file incorporating the data object metadata, the seed file having a plurality of table structures each containing descriptive fields pertaining to locations of each data object stored in the object storage environment; wherein a local module on the local node accesses the seed file for reference in identifying deduplicable data. 14. The computer program product of claim 13 , further including an executable portion that pairs in a remote module executing on an object storage system within the distributed computing components. 15. The computer program product of claim 13 , further including an executable portion that transfers, from the object storage system within the distributed computing components to the local node, only those data objects identified as nonduplicate data by reference to the seed file. 16. The computer program product of claim 15 , further including an executable portion that initiates a first table in the seed file incorporating a calculated hash value of each data object already existing in the object storage environment. 17. The computer program product of claim 16 , further including an executable portion that initiates a second table in the seed file, which, when requested by the remote module, incorporates a mapping between a local data object identifier corresponding to a data object requested by a user and a remote data object identifier corresponding to a data object maintained by the object storage system. 18. The computer program product of claim 17 , further including an executable portion that generates, by the remote module, a temporary data object representing offsets corresponding to the data object maintained by the object storage system to be transferred to the user.

Assignees

Inventors

Classifications

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Hash-based (content-based indexing of textual data G06F16/31) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10747734B2 cover?
Embodiments for, in an object storage environment, deduplicating data within and between distributed computing components by a processor. A deduplication operation is paired with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to a local node.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04L67/1097. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).