File immutability using a deduplication file system in a public cloud using new filesystem redirection
US-2024103978-A1 · Mar 28, 2024 · US
US9910857B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9910857-B2 |
| Application number | US-201414263016-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2014 |
| Priority date | Apr 28, 2013 |
| Publication date | Mar 6, 2018 |
| Grant date | Mar 6, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for data management are disclosed. With embodiments of the present disclosure, data files originating from the same source data can be de-duplicated. One such method comprises calculating one or more of a first characteristic value for first data in a first format, and one or more second characteristic values for one or more data in one or more second formats into which the first data can be converted, said characteristic value uniquely representing an arrangement characteristic of at least part of bits of data in a particular format. The method also includes storing one of the first data and the second data in response to one of the calculated characteristic values being the same as a stored characteristic value corresponding to a second data.
Opening claim text (preview).
What is claimed is: 1. A method for data management, comprising: calculating a first hash value for a first data in a first format; in response to locating a stored hash value that corresponds to the first hash value, not storing the first data; and in response to not locating a stored hash value that corresponds to the first hash value: converting the first data into one or more data in alternate formats in accordance with one or more conversion policies for converting between the first format and the one or more alternate formats, wherein the one or more conversion policies are stored in a conversion policy library; calculating a hash value for each of the one or more data in alternate formats; storing one copy of the first data; and storing the calculated hash values, including the first hash value, separately from the first data, each of the calculated hash values having fewer bits than the first data in any of the alternate formats, thereby providing alternate stored hash values for the first data, the alternate stored hash values used for identifying duplicate copies of the first data when the first data is in any of the alternate formats, wherein a number of computer calculations required to identify duplicate copies of the first data using the alternate hash values is less than a number of computer calculations required to identity duplicate copies of the first data using any of the alternate formats. 2. The method according to claim 1 , further comprising: in response to not storing the first data, storing an indicator pointing to the first data. 3. The method according to claim 2 , further comprising: receiving a request to retrieve the first data in the first format; and providing one of the following as the first data in the first format: the first data as stored in the first format, the first data as pointed to by the indicator; and a second data as converted into the first format in accordance with the conversion policies. 4. A system for data management, comprising a calculating unit and a managing unit, the system configured to: calculate a first hash value for a first data in a first format; in response to locating a stored hash value that corresponds to the first hash value, not storing the first data; and in response to not locating a stored hash value that corresponds to the first hash value: convert the first data into one or more data in alternate formats in accordance with one or more conversion policies for converting between the first format and the one or more alternate formats, wherein the one or more conversion policies are stored in a conversion policy library; calculate a hash value for each of the one or more data in alternate formats; store one copy of the first data; and store the calculated hash values, including the first hash value, separately from the first data, each of the calculated hash values having fewer bits than the first data in any of the alternate formats, thereby providing alternate stored hash values for the first data, the alternate stored hash values used for identifying duplicate copies of the first data when the first data is in any of the alternate formats, wherein a number of computer calculations required to identify duplicate copies of the first data using the alternate hash values is less than a number of computer calculations required to identity duplicate copies of the first data using any of the alternate formats. 5. The system according to claim 4 , wherein the system is further configured to: store an indicator pointing to the first data in response to not storing the first data. 6. The system according to claim 4 , wherein the system is further configured to: receive a request to retrieve the first data in the first format; and provide one of the following as the first data in the first format: the first data as stored in the first format, the first data as pointed to by the indicator; and a second data as converted into the first format in accordance with the conversion policies. 7. A computer program product for data management, the computer program product comprising a non-transitory computer readable storage medium having program code embodied therewith, the program code executable by a processor to perform a method, the method comprising: calculating a first hash value for a first data in a first format; in response to locating a stored hash value that corresponds to the first hash value, not storing the first data; and in response to not locating a stored hash value that corresponds to the first hash value: converting the first data into one or more data in alternate formats in accordance with one or more conversion policies for converting between the first format and the one or more alternate formats, wherein the one or more conversion policies are stored in a conversion policy library; calculating a hash value for each of the one or more data in alternate formats; storing one copy of the first data; and storing the calculated hash values, including the first hash value, separately from the first data, each of the calculated hash values having fewer bits than the first data in any of the alternate formats, thereby providing alternate stored hash values for the first data, the alternate stored hash values used for identifying duplicate copies of the first data when the first data is in any of the alternate formats, wherein a number of computer calculations required to identify duplicate copies of the first data using the alternate hash values is less than a number of computer calculations required to identity duplicate copies of the first data using any of the alternate formats. 8. The computer program product according to claim 7 , the method further comprising: in response to not storing the first data, storing an indicator pointing to the first data. 9. The computer program product according to claim 8 , the method further comprising: receiving a request to retrieve the first data in the first format; and providing one of the following as the first data in the first format: the first data as stored in the first format, the first data as pointed to by the indicator; and a second data as converted into the first format in accordance with the conversion policies.
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.