Writing data in a distributed data storage system

US9507537B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9507537-B2
Application numberUS-201514684956-A
CountryUS
Kind codeB2
Filing dateApr 13, 2015
Priority dateMar 5, 2010
Publication dateNov 29, 2016
Grant dateNov 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatuses, including computer programs encoded on computer-readable media, for receiving a write request that includes data and a client address at which to store the data. The data is segmented into the one or more storage units. A storage unit identifier for each of the one or more storage units is computed that uniquely identifies content of a storage unit. A mapping between each storage unit identifier to a block server is determined. For each of the one or more storage units, the storage unit and the corresponding storage unit identifier is sent to a block server. The block server stores the storage unit and information on where the storage unit is stored on the block server for the storage unit identifier. Multiple client addresses associated with a storage unit with the same storage unit identifier are mapped to a single storage unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for writing data, the data including one or more storage units, the method comprising: receiving a write request, the write request including client data and a client address, wherein the client address identifies the client data; segmenting the client data into the one or more storage units; computing a storage unit identifier for each of the one or more storage units, wherein the storage unit identifier for each of the one or more storage units uniquely identifies content of a storage unit associated with the storage unit identifier; determining, by a metadata server, a mapping between each storage unit identifier to a block server; for each of the one or more storage units, sending the storage unit and the corresponding storage unit identifier to a block server based upon the mapping between the storage unit identifier to the block server, wherein the block server stores the storage unit and maps the storage unit identifier to where the storage unit is stored on the block server, and wherein a storage unit that is associated with multiple client addresses is mapped to a single storage unit and stored on a block server one time; receiving a storage indication from the block server for each of the one or more storage units; and storing a mapping between the client address and each of the one or more storage units based upon the storage indication. 2. The method of claim 1 , wherein the block server determines if the storage unit identifier already exists on the block server, wherein the block server does not store the storage unit on the block server if the storage unit identifier already exists on the block server, and wherein the block server stores the storage unit if the storage unit identifier does not exist on the block server. 3. The method of claim 2 , wherein the storage unit is not stored multiple times on a single storage medium on the block server. 4. The method of claim 3 , further comprising: determining a number of unique storage unit identifiers associated with client data; and calculating an amount of space used by the client data based upon the number of unique storage units identifiers. 5. The method of claim 3 , further comprising: determining a plurality of storage unit identifiers associated with client data; for each of the plurality of storage unit identifiers: determining if the storage unit identifier is present in a Bloom filter; and adding the storage unit identifier to the Bloom filter based upon the determining the storage unit identifier is not present in the Bloom filter; and calculating an amount of space used by the client data based upon the Bloom filter. 6. The method of claim 1 , further comprising: for each of the one or more storage units, sending the storage unit and the corresponding storage unit identifier to a second block server to redundantly store the storage unit and the corresponding storage unit identifier on multiple block servers. 7. The method of claim 1 , further comprising redundantly storing the mapping between the client address and one or more storage unit identifiers in multiple metadata servers, wherein multiple write requests are performed to redundantly store the mapping. 8. A non-transitory computer-readable storage medium containing instructions for writing data, the data including one or more storage units, the instructions for controlling a computer system to perform operations comprising: receiving a write request, the write request including client data and a client address, wherein the client address identifies the client data; segmenting the client data into the one or more storage units; computing a storage unit identifier for each of the one or more storage units, wherein the storage unit identifier for each of the one or more storage units uniquely identifies content of a storage unit associated with the storage unit identifier; determining, by a metadata server, a mapping between each storage unit identifier to a block server; for each of the one or more storage units, sending the storage unit and the corresponding storage unit identifier to a block server based upon the mapping between the storage unit identifier to the block server, wherein the block server stores the storage unit and maps the storage unit identifier to where the storage unit is stored on the block server, and wherein a storage unit that is associated with multiple client addresses is mapped to a single storage unit and stored on a block server one time; receiving a storage indication from the block server for each of the one or more storage units; and storing a mapping between the client address and each of the one or more storage units based upon the storage indication. 9. The non-transitory computer-readable storage medium of claim 8 , wherein the block server determines if the storage unit identifier already exists on the block server, wherein the block server does not store the storage unit on the block server if the storage unit identifier already exists on the block server, and wherein the block server stores the storage unit if the storage unit identifier does not exist on the block server. 10. The non-transitory computer-readable storage medium of claim 9 , wherein the operations further comprise: determining a number of unique storage unit identifiers associated with client data; and calculating an amount of space used by the client data based upon the number of unique storage units identifiers. 11. The non-transitory computer-readable storage medium of claim 9 , wherein the operations further comprise: determining a plurality of storage unit identifiers associated with client data; for each of the plurality of storage unit identifiers: determining if the storage unit identifier is present in a Bloom filter; and adding the storage unit identifier to the Bloom filter based upon the determining the storage unit identifier is not present in the Bloom filter; and calculating an amount of space used by the client data based upon the Bloom filter. 12. The non-transitory computer-readable storage medium of claim 8 , wherein computing a storage unit identifier for each of the one or more storage units comprises computing a hash of content for a respective storage unit. 13. The non-transitory computer-readable storage medium of claim 8 , wherein the operations further comprise: for each of the one or more storage units, sending the storage unit and the corresponding storage unit identifier to a second block server to redundantly store the storage unit and the corresponding storage unit identifier on multiple block servers. 14. The non-transitory computer-readable storage medium of claim 8 , wherein the operations further comprise redundantly storing the mapping between the client address and one or more storage unit identifiers in multiple metadata servers, wherein multiple write requests are performed to redundantly store the mapping. 15. A system comprising: a metadata server configured to: receive a write request, the write request including client data and a client address, wherein the client address identifies the client data; segment the client data into the one or more storage units; compute a storage unit identifier for each of the one or more storage units, wherein the storage unit identifier for each of the one or more storage units uniquely identifies content of a storage unit associated with the storage unit identifier; determine a mapping between each storage unit identifier to a block server; for each of the one or more storage units, send the storage unit and the corresponding stor

Assignees

Inventors

Classifications

  • at area level, e.g. provisioning of virtual or logical volumes · CPC title

  • Physics · mapped topic

  • G06F3/0619Primary

    in relation to data integrity, e.g. data losses, bit errors · CPC title

  • Disk arrays, e.g. RAID, JBOD · CPC title

  • Distributed shared memory [DSM], e.g. remote direct memory access [RDMA] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9507537B2 cover?
Methods, systems, and apparatuses, including computer programs encoded on computer-readable media, for receiving a write request that includes data and a client address at which to store the data. The data is segmented into the one or more storage units. A storage unit identifier for each of the one or more storage units is computed that uniquely identifies content of a storage unit. A mapping …
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0619. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).