Method and system for efficient data replication in big data environment

US11086901B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11086901-B2
Application numberUS-201815884748-A
CountryUS
Kind codeB2
Filing dateJan 31, 2018
Priority dateJan 31, 2018
Publication dateAug 10, 2021
Grant dateAug 10, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system includes a persistent storage and a data transfer manager. The persistent storage stores sending entity storage resources and receiving entity storage resources. The data transfer manager obtains a data transfer request for data in the sending entity storage resources. In response to obtaining the data transfer request, the data transfer manager obtains a sending entity schema associated with the data; determines a current storage location of the data using the obtained sending entity schema; determines a future storage location for a copy of the data in the receiving entity storage resources; stores a copy of the data at the determined future storage location; adapts the sending entity schema based on the determined future storage location; and modifies a receiving entity schema based on the adapted sending entity schema.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a persistent storage for storing: sending entity storage resources exclusively for a first computation framework, and receiving entity storage resources exclusively for a second computation framework; and a data transfer manager comprising a processor and a memory and programmed to: obtain a data transfer request from one of the first computation framework and the second computation framework, wherein the data transfer request is for data in the sending entity storage resources; in response to obtaining the data transfer request: obtain a sending entity schema of the first computation framework associated with the data, determine a current storage location of the data using the obtained sending entity schema, determine a future storage location for a copy of the data in the receiving entity storage resources, store the copy of the data at the determined future storage location without invoking either of the first computation framework and the second computation framework, adapt the sending entity schema to obtain an adapted sending entity schema comprising the determined future storage location in which the copy of the data is stored, and modify a receiving entity schema of the second computation framework based on the adapted sending entity schema. 2. The system of claim 1 , wherein obtaining the sending entity schema associated with the data comprises: sending a request for the sending entity schema to the first computation framework of the sending entity; and obtaining the sending entity schema after sending the request for the schema. 3. The system of claim 2 , wherein sending the request for the sending entity schema initiates a computation in the first computation framework of the sending entity. 4. The system of claim 3 , wherein the computation stores the sending entity schema in the sending entity storage resources. 5. The system of claim 4 , wherein the computation in a query. 6. The system of claim 1 , wherein determining the current storage location of the data using the obtained sending entity schema comprises: obtaining a plurality of identifiers for portions of the data; matching respective identifiers of the plurality of identifiers to respective persistent storage access information for the sending entity storage resources; and storing the matched persistent storage access information for the sending entity storage resources as the current storage location. 7. The system of claim 1 , where the sending entity storage resources comprises: the sending entity schema; and the data, wherein the sending entity schema comprises persistent storage access information associated with the data. 8. The system of claim 1 , wherein storing the copy of the data at the determined future storage location comprises: initiating a block level read of the persistent storage based on the determined current storage location. 9. The system of claim 8 , wherein the first computation framework of the sending entity does not initiate the block level read. 10. The system of claim 8 , wherein the block level read is initiated by sending a request to a storage manager of the persistent storage. 11. The system of claim 1 , wherein adapting the sending entity schema based on the determined future storage location comprises: identifying a plurality of portions of the obtained sending entity schema associated with a plurality of portions of the copy of the data stored in the receiving entity storage resources; generating, for each portion of the copy of the data stored in the receiving entity storage resources generate, update information comprising: a storage location of the respective portion of the copy of the data in the receiving entity storage resources, and an identifier for the respective portion of the copy of the data in the receiving entity storage resources; and modifying each portion of the plurality of portions of the obtained sending entity schema to reflect the generated update information. 12. The system of claim 11 , wherein modifying each portion of the plurality of portions of the obtained sending entity schema to reflect the generated update information creates mappings between each portion of the plurality of portions of the obtained sending entity schema to the respective associated portions of the copy of the data stored in the receiving entity storage resources. 13. The system of claim 1 , wherein modifying the receiving entity schema based on the adapted sending entity schema comprises: adding a portion of the adapted sending entity schema to the receiving entity schema, wherein the portion of the adapted sending entity schema was modified by the adapting. 14. The system of claim 13 , wherein modifying the receiving entity schema based on the adapted sending entity schema further comprises: omitting a second portion of the adapted sending entity schema from adding to the receiving entity schema, wherein the second portion of the adapted sending entity schema was not modified by the adapting. 15. A method of transferring data from a sending entity to a receiving entity, comprising: obtaining a data transfer request for the data in sending entity storage resources of the sending entity to receiving entity storage resources of the receiving entity, wherein the sending entity storage resources are exclusively associated with a first computation framework and the receiving entity storage resources are exclusively associated with a second computation framework, and wherein the data transfer request is obtained from one of the first computational framework and the second computation framework; in response to obtaining the data transfer request: obtaining a sending entity schema of the first computation framework associated with the data, determining a current storage location of the data using the obtained sending entity schema, determining a future storage location for a copy of the data in receiving entity storage resources associated with the receiving entity, storing the copy of the data at the determined future storage location, adapting the sending entity schema to obtain an adapted sending entity schema comprising the determined future storage location in which the copy of the data is stored without invoking either of the first computation framework and the second computation framework, and modifying a receiving entity schema, associated with the receiving entity, based on the adapted sending entity schema. 16. The method of claim 15 , wherein obtaining the sending entity schema associated with the data comprises: sending a request for the sending entity schema to the first computation framework of the sending entity; and obtaining the sending entity schema after sending the request for the schema. 17. The system of claim 15 , wherein storing a copy of the data at the determined future storage location comprises: initiating a block level read of the persistent storage based on the determined current storage location by sending a request to a storage manager that manages the sending entity storage resources. 18. A non-transitory computer readable medium comprising computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for transferring data from a sending entity to a receiving entity, the method comprising: obtaining a data transfer request for the data in sending entity storage resources of the sending entity to receiving entity storage resources of the receiving entity,

Assignees

Inventors

Classifications

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • G06F16/275Primary

    Synchronous replication · CPC title

  • Management of the data involved in backup or backup restore · CPC title

  • G06F16/178Primary

    Techniques for file synchronisation in file systems · CPC title

  • Distributed file systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11086901B2 cover?
A system includes a persistent storage and a data transfer manager. The persistent storage stores sending entity storage resources and receiving entity storage resources. The data transfer manager obtains a data transfer request for data in the sending entity storage resources. In response to obtaining the data transfer request, the data transfer manager obtains a sending entity schema associat…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/275. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 10 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).