Optimizing restoration of deduplicated data

US2016203058A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016203058-A1
Application numberUS-201615073703-A
CountryUS
Kind codeA1
Filing dateMar 18, 2016
Priority dateSep 12, 2012
Publication dateJul 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer identifies a plurality of data retrieval requests that may be serviced using a plurality of unique data chunks. The computer services the data retrieval requests by utilizing at least one of the unique data chunks. At least one of the unique data chunks can be utilized for servicing two or more of the data retrieval requests. The computer determines a servicing sequence for the plurality of data retrieval requests such that the two or more of the data retrieval requests that can be serviced utilizing the at least one of the unique data chunks are serviced consecutively. The computer services the plurality of data retrieval requests according to the servicing sequence.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for restoring deduplicated data, the method comprising the steps of: a first computing device identifying a plurality of data retrieval requests for servicing with a plurality of unique data chunks, wherein each data retrieval request can be serviced utilizing at least one of the unique data chunks, and wherein at least one of the unique data chunks can be utilized for the servicing of two or more of the data retrieval requests; the first computing device determining a servicing sequence of the plurality of data retrieval requests such that the two or more of the data retrieval requests that can be serviced utilizing the at least one of the unique data chunks are serviced consecutively; and the first computing device servicing the plurality of data retrieval requests according to the servicing sequence. 2 . The method of claim 1 , wherein the determining a servicing sequence of the plurality of data retrieval requests further comprises the step of: the first computing device mapping the plurality of data retrieval requests into a relationship graph, wherein each node of the relationship graph corresponds to a data retrieval request, and wherein each edge of the relationship graph has an edge weight associated with the number of unique data chunks shared between the pair of data retrieval requests connected by the edge. 3 . The method of claim 2 , wherein the determining a servicing sequence of the plurality of data retrieval requests further comprises the steps of: the first computing device determining which node of the relationship graph has the highest relationship score; and the first computing device traversing the relationship graph, starting from the node of the relationship graph that has the highest relationship score, in substantially breadth-first order according to descending relationship score. 4 . The method of claim 2 , wherein the determining a servicing sequence of the plurality of data retrieval requests further comprises the steps of: the first computing device storing the node that has the highest relationship score in the servicing sequence; and the first computing device, responsive to determining that a plurality of nodes have a relationship score in common, storing the plurality of nodes having the relationship score in common. 5 . The method of claim 3 , wherein the determining which node of the relationship graph has the highest relationship score further includes determining a total edge weight of the node, a total number of edges of the node, or a weighted sum of a total edge weight of the node and a total number of edges of the node. 6 . The method of claim 1 , wherein the servicing the plurality of data retrieval requests according to the servicing sequence further includes the steps of: the first computing device receiving one of the plurality of data retrieval requests from a second computing device; the first computing device combining the plurality of unique data chunks to generate a data object; and the first computing device transmitting the generated data object combined from the plurality of unique data chunks to the second computing device. 7 . The method of claim 1 , wherein the servicing the plurality of data retrieval requests according to the servicing sequence further includes the steps of: the first computing device receiving one of the plurality of data retrieval requests from a second computing device; and the first computing device transmitting the unique data chunks required to service the data retrieval request to the second computing device, wherein the second computing device combines the unique data chunks to generate a data object. 8 . A computer program product to restore deduplicated data, the computer program product comprising: one or more computer-readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: program instructions to identify a plurality of data retrieval requests for servicing with a plurality of unique data chunks, wherein each data retrieval request can be serviced utilizing at least one of the unique data chunks, and wherein at least one of the unique data chunks can be utilized for the servicing of two or more of the data retrieval requests; program instructions to determine a servicing sequence of the plurality of data retrieval requests such that the two or more of the data retrieval requests that can be serviced utilizing the at least one of the unique data chunks are serviced consecutively; and program instructions to service the plurality of data retrieval requests according to the servicing sequence. 9 . The computer program product of claim 8 , further comprising program instructions to map the plurality of data retrieval requests into a relationship graph, wherein each node of the relationship graph corresponds to a data retrieval request, and wherein each edge of the relationship graph has an edge weight associated with the number of unique data chunks shared between the pair of data retrieval requests connected by the edge. 10 . The computer program product of claim 9 , further comprising program instructions to: determine which node of the relationship graph has the highest relationship score; and traverse the relationship graph by starting from the node of the relationship graph that has the highest relationship score, in substantially breadth-first order according to descending relationship score. 11 . The computer program product of claim 9 , further comprising program instructions to: store the node that has the highest relationship score in the servicing sequence; and responsive to determining that a plurality of nodes have a relationship score in common, storing the plurality of nodes having the relationship score in common. 12 . The computer program product of claim 10 , wherein the program instructions to determine which node of the relationship graph has the highest relationship score further includes program instructions to determine a total edge weight of the node; a total number of edges of the node; or a weighted sum of a total edge weight of the node and a total number of edges of the node. 13 . The computer program product of claim 8 , wherein the program instructions to service the plurality of data retrieval requests according to the servicing sequence further includes the steps of: receiving one of the plurality of data retrieval requests from a second computing device; combining the plurality of unique data chunks to generate a data object; and transmitting the generated data object combined from the plurality of unique data chunks to the second computing device. 14 . The computer program product of claim 8 , wherein the program instructions to service the plurality of data retrieval requests according to the servicing sequence further includes program instructions to: receiving one of the plurality of data retrieval from a second computing device; and transmitting the unique data chunks required to service the data retrieval, wherein the second computing device combines the unique data chunks to generate a data object. 15 . A computer system for restoring deduplicated data, the computer system comprising: one or more computer processors; and one or more computer-readable storage media storing program instructions, wherein the computer processor executes the program instructions to: identify a plurality of data retrieval requests for servicing with a plurality of unique data chunks, wherein each data retrieval request can be service

Assignees

Inventors

Classifications

  • using de-duplication of the data · CPC title

  • Management of the backup or restore process · CPC title

  • Physics · mapped topic

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016203058A1 cover?
A computer identifies a plurality of data retrieval requests that may be serviced using a plurality of unique data chunks. The computer services the data retrieval requests by utilizing at least one of the unique data chunks. At least one of the unique data chunks can be utilized for servicing two or more of the data retrieval requests. The computer determines a servicing sequence for the plura…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/1458. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).