Data management

US9910857B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9910857-B2
Application numberUS-201414263016-A
CountryUS
Kind codeB2
Filing dateApr 28, 2014
Priority dateApr 28, 2013
Publication dateMar 6, 2018
Grant dateMar 6, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for data management are disclosed. With embodiments of the present disclosure, data files originating from the same source data can be de-duplicated. One such method comprises calculating one or more of a first characteristic value for first data in a first format, and one or more second characteristic values for one or more data in one or more second formats into which the first data can be converted, said characteristic value uniquely representing an arrangement characteristic of at least part of bits of data in a particular format. The method also includes storing one of the first data and the second data in response to one of the calculated characteristic values being the same as a stored characteristic value corresponding to a second data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for data management, comprising: calculating a first hash value for a first data in a first format; in response to locating a stored hash value that corresponds to the first hash value, not storing the first data; and in response to not locating a stored hash value that corresponds to the first hash value: converting the first data into one or more data in alternate formats in accordance with one or more conversion policies for converting between the first format and the one or more alternate formats, wherein the one or more conversion policies are stored in a conversion policy library; calculating a hash value for each of the one or more data in alternate formats; storing one copy of the first data; and storing the calculated hash values, including the first hash value, separately from the first data, each of the calculated hash values having fewer bits than the first data in any of the alternate formats, thereby providing alternate stored hash values for the first data, the alternate stored hash values used for identifying duplicate copies of the first data when the first data is in any of the alternate formats, wherein a number of computer calculations required to identify duplicate copies of the first data using the alternate hash values is less than a number of computer calculations required to identity duplicate copies of the first data using any of the alternate formats. 2. The method according to claim 1 , further comprising: in response to not storing the first data, storing an indicator pointing to the first data. 3. The method according to claim 2 , further comprising: receiving a request to retrieve the first data in the first format; and providing one of the following as the first data in the first format: the first data as stored in the first format, the first data as pointed to by the indicator; and a second data as converted into the first format in accordance with the conversion policies. 4. A system for data management, comprising a calculating unit and a managing unit, the system configured to: calculate a first hash value for a first data in a first format; in response to locating a stored hash value that corresponds to the first hash value, not storing the first data; and in response to not locating a stored hash value that corresponds to the first hash value: convert the first data into one or more data in alternate formats in accordance with one or more conversion policies for converting between the first format and the one or more alternate formats, wherein the one or more conversion policies are stored in a conversion policy library; calculate a hash value for each of the one or more data in alternate formats; store one copy of the first data; and store the calculated hash values, including the first hash value, separately from the first data, each of the calculated hash values having fewer bits than the first data in any of the alternate formats, thereby providing alternate stored hash values for the first data, the alternate stored hash values used for identifying duplicate copies of the first data when the first data is in any of the alternate formats, wherein a number of computer calculations required to identify duplicate copies of the first data using the alternate hash values is less than a number of computer calculations required to identity duplicate copies of the first data using any of the alternate formats. 5. The system according to claim 4 , wherein the system is further configured to: store an indicator pointing to the first data in response to not storing the first data. 6. The system according to claim 4 , wherein the system is further configured to: receive a request to retrieve the first data in the first format; and provide one of the following as the first data in the first format: the first data as stored in the first format, the first data as pointed to by the indicator; and a second data as converted into the first format in accordance with the conversion policies. 7. A computer program product for data management, the computer program product comprising a non-transitory computer readable storage medium having program code embodied therewith, the program code executable by a processor to perform a method, the method comprising: calculating a first hash value for a first data in a first format; in response to locating a stored hash value that corresponds to the first hash value, not storing the first data; and in response to not locating a stored hash value that corresponds to the first hash value: converting the first data into one or more data in alternate formats in accordance with one or more conversion policies for converting between the first format and the one or more alternate formats, wherein the one or more conversion policies are stored in a conversion policy library; calculating a hash value for each of the one or more data in alternate formats; storing one copy of the first data; and storing the calculated hash values, including the first hash value, separately from the first data, each of the calculated hash values having fewer bits than the first data in any of the alternate formats, thereby providing alternate stored hash values for the first data, the alternate stored hash values used for identifying duplicate copies of the first data when the first data is in any of the alternate formats, wherein a number of computer calculations required to identify duplicate copies of the first data using the alternate hash values is less than a number of computer calculations required to identity duplicate copies of the first data using any of the alternate formats. 8. The computer program product according to claim 7 , the method further comprising: in response to not storing the first data, storing an indicator pointing to the first data. 9. The computer program product according to claim 8 , the method further comprising: receiving a request to retrieve the first data in the first format; and providing one of the following as the first data in the first format: the first data as stored in the first format, the first data as pointed to by the indicator; and a second data as converted into the first format in accordance with the conversion policies.

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9910857B2 cover?
Methods and systems for data management are disclosed. With embodiments of the present disclosure, data files originating from the same source data can be de-duplicated. One such method comprises calculating one or more of a first characteristic value for first data in a first format, and one or more second characteristic values for one or more data in one or more second formats into which the …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).