Generating datasets using approximate baselines

US11816129B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11816129-B2
Application numberUS-202117354236-A
CountryUS
Kind codeB2
Filing dateJun 22, 2021
Priority dateJun 22, 2021
Publication dateNov 14, 2023
Grant dateNov 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generating datasets using approximate baselines including receiving, by a source storage system, an instruction to create, on a target storage system, a current snapshot for a source dataset stored on the source storage system, wherein no snapshots for the source dataset exist on the target storage system; selecting, as a baseline dataset, a similar dataset from a plurality of datasets on the source storage system with an existing snapshot on the target storage system, wherein the similar dataset comprises at least a portion of the source dataset; instructing the target storage system to generate a baseline snapshot for the source dataset using a copy of the existing snapshot of the baseline dataset; and transferring, from the source storage system to the target storage system, only a difference between the baseline dataset and the source dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by a source storage system, an instruction to create, on a target storage system, a current snapshot for a source dataset stored on the source storage system, wherein no snapshots for the source dataset exist on the target storage system; selecting, as a baseline dataset, a similar dataset to the source dataset from a plurality of datasets on the source storage system with an existing snapshot on both the source storage system and the target storage system, wherein the similar dataset comprises at least a portion of the source dataset; instructing the target storage system to generate a baseline snapshot for the source dataset using a copy of the existing snapshot of the baseline dataset that exists at the target storage system; and transferring, from the source storage system to the target storage system, only a difference between the baseline dataset and the source dataset, wherein the current snapshot of the source dataset is generated on the target storage system using the baseline snapshot on the target storage system and the difference between the baseline dataset and the source dataset. 2. The method of claim 1 , wherein the similar dataset is non-ancestrally related to the source dataset. 3. The method of claim 1 , wherein the similar dataset is below the source dataset in a relationship graph of datasets related to the source dataset. 4. The method of claim 1 , wherein the similar dataset has a timestamp after a timestamp for the source dataset. 5. The method of claim 1 , wherein the similar dataset is selected from the plurality of datasets based on a difference between the source dataset and each of the plurality of datasets. 6. The method of claim 1 , wherein the similar dataset is selected from the plurality of datasets based on a timestamp of each of the plurality of datasets. 7. The method of claim 1 , wherein the similar dataset is selected from the plurality of datasets based on a distance from the source dataset in a relationship graph of datasets related to the source dataset. 8. The method of claim 1 , wherein each of the plurality of datasets has a timestamp within a threshold range of a timestamp of the source dataset. 9. The method of claim 1 , wherein each of the plurality of datasets is within a threshold distance from the source dataset in a relationship graph of datasets related to the source dataset. 10. The method of claim 1 , wherein the source dataset is a volume. 11. The method of claim 1 , wherein the source dataset is a file system. 12. An apparatus comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the steps of: receiving, by a source storage system, an instruction to create, on a target storage system, a current snapshot for a source dataset stored on the source storage system, wherein no snapshots for the source dataset exist on the target storage system; selecting, as a baseline dataset, a similar dataset to the source dataset from a plurality of datasets on the source storage system with an existing snapshot on both the source storage system and the target storage system, wherein the similar dataset comprises at least a portion of the source dataset; instructing the target storage system to generate a baseline snapshot for the source dataset using a copy of the existing snapshot of the baseline dataset that exists at the target storage system; and transferring, from the source storage system to the target storage system, only a difference between the baseline dataset and the source dataset, wherein the current snapshot of the source dataset is generated on the target storage system using the baseline snapshot on the target storage system and the difference between the baseline dataset and the source dataset. 13. A method comprising: receiving, by a source storage system, an instruction to restore, on the source storage system, a source dataset from a snapshot stored on a target storage system; selecting, as a baseline dataset, a similar dataset to the source dataset from a plurality of datasets on the source storage system, wherein the similar dataset comprises at least a portion of the source dataset; transferring, from the target storage system to the source storage system, only a portion of the snapshot representing a difference between the baseline dataset and the source dataset; and generating the source dataset on the source storage system using the baseline dataset on the source storage system and the transferred portion of the snapshot representing the difference between the baseline dataset and the source dataset. 14. The method of claim 13 , wherein the similar dataset is selected from the plurality of datasets using metadata for the source dataset. 15. The method of claim 14 , wherein the metadata for the source dataset is retrieved from the target storage system. 16. The method of claim 13 , wherein the similar dataset is below the source dataset in a relationship graph of datasets related to the source dataset. 17. The method of claim 13 , wherein the similar dataset has a timestamp after a timestamp for the source dataset. 18. The method of claim 13 , wherein the similar dataset is selected from the plurality of datasets based on a timestamp of each of the plurality of datasets. 19. The method of claim 13 , wherein the similar dataset is selected from the plurality of datasets based on a distance from the source dataset in a relationship graph of datasets related to the source dataset. 20. The method of claim 13 , wherein the source dataset is a volume.

Assignees

Inventors

Classifications

  • G06F16/27Primary

    Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • G06F3/0688Primary

    Non-volatile semiconductor memory arrays · CPC title

  • Replication mechanisms · CPC title

  • Improving the reliability of storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11816129B2 cover?
Generating datasets using approximate baselines including receiving, by a source storage system, an instruction to create, on a target storage system, a current snapshot for a source dataset stored on the source storage system, wherein no snapshots for the source dataset exist on the target storage system; selecting, as a baseline dataset, a similar dataset from a plurality of datasets on the s…
Who is the assignee on this patent?
Pure Storage Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/27. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).