Read-only file system for testing de-duplication
US-2017220593-A1 · Aug 3, 2017 · US
US10747734B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10747734-B2 |
| Application number | US-201615189232-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 22, 2016 |
| Priority date | Jun 22, 2016 |
| Publication date | Aug 18, 2020 |
| Grant date | Aug 18, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments for, in an object storage environment, deduplicating data within and between distributed computing components by a processor. A deduplication operation is paired with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to a local node.
Opening claim text (preview).
The invention claimed is: 1. In an object storage environment, a method for deduplicating data within and between distributed computing components by a processor, comprising: pairing a deduplication operation with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to a local node such that prior to transferring any portions of the data object from the object storage environment to the local node, the metadata associated with the data object is examined to determine those portions of the data object not required to be transferred according to the determination of the data necessitating deduplication; and initiating a seed file incorporating the data object metadata, the seed file having a plurality of table structures each containing descriptive fields pertaining to locations of each data object stored in the object storage environment; wherein a local module on the local node accesses the seed file for reference in identifying deduplicable data. 2. The method of claim 1 , further including performing the pairing in a remote module executing on an object storage system within the distributed computing components. 3. The method of claim 1 , further including transferring, from the object storage system within the distributed computing components to the local node, only those data objects identified as nonduplicate data by reference to the seed file. 4. The method of claim 3 , further including initiating a first table in the seed file incorporating a calculated hash value of each data object already existing in the object storage environment. 5. The method of claim 4 , further including initiating a second table in the seed file, which, when requested by the remote module, incorporates a mapping between a local data object identifier corresponding to a data object requested by a user and a remote data object identifier corresponding to a data object maintained by the object storage system. 6. The method of claim 5 , further including generating, by the remote module, a temporary data object representing offsets corresponding to the data object maintained by the object storage system to be transferred to the user. 7. In an object storage environment, a system for deduplicating data within and between distributed computing components, comprising: a processor, integrated into one of the distributed computing components; and an additional processor associated with a local module on a local node, wherein the processor: pairs a deduplication operation with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to the local node such that prior to transferring any portions of the data object from the object storage environment to the local node, the metadata associated with the data object is examined to determine those portions of the data object not required to be transferred according to the determination of the data necessitating deduplication; and initiates a seed file incorporating the data object metadata, the seed file having a plurality of table structures each containing descriptive fields pertaining to locations of each data object stored in the object storage environment; wherein the additional processor associated with the local module on the local node accesses the seed file for reference in identifying deduplicable data. 8. The system of claim 7 , wherein the processor pairs the deduplication operation in a remote module executing on an object storage system within the distributed computing components. 9. The system of claim 7 , wherein the processor transfers, from the object storage system within the distributed computing components to the local node, only those data objects identified as nonduplicate data by reference to the seed file. 10. The system of claim 9 , wherein the processor initiates a first table in the seed file incorporating a calculated hash value of each data object already existing in the object storage environment. 11. The system of claim 10 , wherein the processor initiates a second table in the seed file, which, when requested by the remote module, incorporates a mapping between a local data object identifier corresponding to a data object requested by a user and a remote data object identifier corresponding to a data object maintained by the object storage system. 12. The system of claim 11 , wherein the processor generates, by the remote module, a temporary data object representing offsets corresponding to the data object maintained by the object storage system to be transferred to the user. 13. In an object storage environment, a computer program product for deduplicating data within and between distributed computing components by a processor, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that pairs a deduplication operation with metadata associated with a data object to determine data necessitating deduplication before the data object is transferred and written to a local node such that prior to transferring any portions of the data object from the object storage environment to the local node, the metadata associated with the data object is examined to determine those portions of the data object not required to be transferred according to the determination of the data necessitating deduplication; and an executable portion that initiates a seed file incorporating the data object metadata, the seed file having a plurality of table structures each containing descriptive fields pertaining to locations of each data object stored in the object storage environment; wherein a local module on the local node accesses the seed file for reference in identifying deduplicable data. 14. The computer program product of claim 13 , further including an executable portion that pairs in a remote module executing on an object storage system within the distributed computing components. 15. The computer program product of claim 13 , further including an executable portion that transfers, from the object storage system within the distributed computing components to the local node, only those data objects identified as nonduplicate data by reference to the seed file. 16. The computer program product of claim 15 , further including an executable portion that initiates a first table in the seed file incorporating a calculated hash value of each data object already existing in the object storage environment. 17. The computer program product of claim 16 , further including an executable portion that initiates a second table in the seed file, which, when requested by the remote module, incorporates a mapping between a local data object identifier corresponding to a data object requested by a user and a remote data object identifier corresponding to a data object maintained by the object storage system. 18. The computer program product of claim 17 , further including an executable portion that generates, by the remote module, a temporary data object representing offsets corresponding to the data object maintained by the object storage system to be transferred to the user.
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Hash-based (content-based indexing of textual data G06F16/31) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.