Managing replicated data

US9489412B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9489412-B2
Application numberUS-201514806147-A
CountryUS
Kind codeB2
Filing dateJul 22, 2015
Priority dateNov 21, 2012
Publication dateNov 8, 2016
Grant dateNov 8, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach for managing replicated data is presented. Metadata is received specifying inter-data correlation(s), inter-replica correlation(s), and data-replica correlation(s) among replicas generated for a system. A unified replication metadata model specifying the correlations is generated. Based on the inter-replica correlation(s), a proper subset of the replicas is selected. Based on the inter-replica and inter-data correlation(s), the selected proper subset of replicas is indexed to generate a unified content index. Based on a current usage of resources, an expected usage and an affinity score for performing an indexing task online or offline are determined. A query is received to locate a data item in at least one of the replicas. Based on the unified content index, the unified replication metadata model, and the query, candidate replica(s) and confidence score(s) indicating likelihood(s) that the candidate replica(s) include the data item are determined.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of managing replicated data, the method comprising the steps of: a computer receiving first metadata specifying inter-data correlation(s), which are correlation(s) between sets of replicated data in a first set of replicas; the computer receiving second metadata specifying inter-replica correlation(s), which are correlation(s) between replicas included in a second set of replicas; the computer receiving third metadata specifying data-replica correlation(s), which are correlation(s) between set(s) of replicated data and respective replica(s) included in a third set of replicas, the first, second and third sets of replicas being included in a plurality of replicas generated for a system; the computer determining a current usage of resources in the system and a threshold usage of the resources; the computer generating a unified replication metadata model specifying the inter-data correlation(s) based on the first metadata, the inter-replica correlation(s) based on the second metadata, and the data-replica correlation(s) based on the third metadata; based on the inter-replica correlation(s) specified by the unified replication metadata model, the computer selecting a proper subset of replicas included in the plurality of replicas; based on the inter-replica and inter-data correlation(s) specified by the unified replication metadata model, the computer indexing the selected proper subset of replicas to generate a unified content index, wherein the step of indexing the selected proper subset includes: if the current usage is less than the threshold usage, then the computer determining an expected additional resource usage due to performing an indexing task online, and based on the expected additional resource usage, the computer determining a resource affinity score for performing the indexing task online; and if the current usage is greater than or equal to the threshold usage, then the computer determining an expected resource usage due to performing the indexing task offline and based on the expected resource usage, the computer determining a resource affinity score for performing the indexing task offline; the computer receiving a query to locate a data item in at least one replica included in the plurality of replicas; and based on the unified content index, the unified replication metadata model, and the received query, the computer determining candidate replica(s) and corresponding confidence score(s), the confidence score(s) indicating respective likelihood(s) that the candidate replica(s) include the data item, and the candidate replica(s) included in the plurality of replicas. 2. The method of claim 1 , further comprising the steps of: based on the inter-replica and inter-data correlation(s) specified by the unified replication metadata model, the computer determining indexer(s) to use for the indexing of the selected proper subset of replicas; and based on the inter-replica and inter-data correlation(s) specified by the unified replication metadata model, the computer determining a prioritized order of indexing tasks included in the step of indexing the selected proper subset of replicas. 3. The method of claim 1 , further comprising the step of the computer receiving event monitoring data that indicates change(s) in the system, wherein the step of indexing the selected proper subset of replicas includes the steps of: based on the unified replication metadata model, the computer determining temporal distances from the replicas in the selected proper subset of replicas to respective fully indexed replicas included in the plurality of replicas; based on the received event monitoring data, the computer determining measures indicating respective amounts of change in the system between timestamps of the replicas in the selected proper subset of replicas and respective nearest fully indexed replicas included in the plurality of replicas; and determining index expectation scores for the respective replicas based on the temporal distances and the measures indicating amounts of change in the system. 4. The method of claim 1 , wherein the step of determining the candidate replica(s) and the corresponding confidence score(s) includes the steps of: based on the unified content index, the computer determining first replica(s) included in the proper subset of replicas that are exact matches to the query; for second replica(s) that are not exact matches to the query, the computer determining respective temporal distance(s) and respective percent change(s) in the system between the second replica(s) and the first replica(s) that are exact matches to the query; for the second replica(s) that are not exact matches to the query, the computer identifying respective nearest neighbor(s) as respective first replica(s) having minimum(s) of the respective temporal distance(s) and respective percent change(s); based on the minimum(s) of the temporal distance(s) and percent change(s), the computer determining confidence score(s) of the second replica(s); the computer sorting the second replica(s) based on the confidence score(s); and the computer directing a device to present the sorted second replica(s) to a user. 5. The method of claim 1 , wherein the step of indexing the selected proper subset of replicas includes the steps of: the computer determining index updates by determining keyword-to-replica mappings; and the computer generating the unified content index based on the index updates. 6. The method of claim 5 , wherein the step of determining the index updates includes the steps of: the computer determining index expectation scores for respective replicas in the selected proper subset of replicas; and the computer sorting the selected proper subset of replicas based in part on the index expectation scores for the respective replicas. 7. The method of claim 5 , wherein the step of determining the index updates includes the steps of: the computer determining resource affinity scores for respective replicas in the selected proper subset of replicas; and the computer sorting the selected proper subset of replicas based in part on the resource affinity scores for the respective replicas. 8. The method claim 1 , further comprising the step of: providing at least one support service for at least one of creating, integrating, hosting, maintaining, and deploying computer-readable program code in the computer, the program code being executed by a processor of the computer to implement the steps of receiving the first metadata, receiving the second metadata, receiving the third metadata, determining the current usage, generating the unified replication metadata model, selecting the proper subset of replicas, indexing the selected proper subset, receiving the query, and determining the candidate replica(s) and the corresponding confidence score(s). 9. A computer program product, comprising a computer-readable, tangible storage device and a computer-readable program code stored in the computer-readable, tangible storage device, the computer-readable program code containing instructions that are executed by a central processing unit (CPU) of a computer system to implement a method of managing replicated data, the method comprising the steps of: the computer system receiving first metadata specifying inter-data correlation(s), which are correlation(s) between sets of replicated data in a first set of replicas; the computer system receiving second metadata specifying inter-replica correlation(s), which are correlation(s) between replicas included in a second set of replicas; the computer system receiving third metadata specifying data-replica correlation(s), which are correlation(s) between s

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9489412B2 cover?
An approach for managing replicated data is presented. Metadata is received specifying inter-data correlation(s), inter-replica correlation(s), and data-replica correlation(s) among replicas generated for a system. A unified replication metadata model specifying the correlations is generated. Based on the inter-replica correlation(s), a proper subset of the replicas is selected. Based on the in…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2228. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).