Efficient memory footprint in deduplicated system storing with content based addressing

US11157372B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11157372-B2
Application numberUS-201916394357-A
CountryUS
Kind codeB2
Filing dateApr 25, 2019
Priority dateSep 21, 2018
Publication dateOct 26, 2021
Grant dateOct 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique is configured to reduce an amount of memory (i.e., memory footprint) usage by each storage node of a cluster needed to store metadata while providing fast and efficient servicing of data in accordance with storage requests issued by a client of the cluster. Illustratively, a block identifier (ID) is used to identify a block of data serviced by the storage node. Metadata embodied as mappings between block IDs and locations of data blocks in the cluster are illustratively maintained in map fragments. A map fragment may be embodied as “active” map fragment or a “frozen” map fragment. An active map fragment refers to a map fragment that has space available to store a mapping, whereas a frozen map fragment refers to a map fragment that is full, i.e., has no available space for storing a mapping. In order to reduce the memory footprint of each storage node, yet still provide fast and efficient servicing of data by the node, the active map fragments are preferably maintained in memory as “in-core” data structures, whereas the frozen map fragments are paged-out and stored on storage devices of the cluster as “on-disk” map fragment structures.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for storing data by a storage service having persistent storage, the method compromising: determining a block identifier (block ID) based on data of a block to be written to the persistent storage; determining a sublist based on a first portion of the block ID; writing the block of data to the persistent storage at a physical block address of the persistent storage; adding a mapping of the block identifier to a map fragment associated with the sublist, wherein the mapping maps the block identifier to the physical block address; updating a filter associated with the map fragment and the block ID, the filter and the map fragment indexed by the sublist into a search data structure; determining whether the map fragment is full; in response to determining that the map fragment is full, writing the map fragment and filter to the persistent storage; and in response to decreasing a persistent storage capacity of the storage service, using a second portion of the block ID smaller than the first portion to determine the sublist. 2. The method of claim 1 , wherein the filter is a Bloom filter; the data block is written to a segment of the persistent storage; and the mapping fragment and the Bloom filter are written to the segment of the persistent storage. 3. The method of claim 1 , further comprising: determining whether a predetermined number of data blocks associated with the sublist is exceeded; and in response to determining that the predetermined number of data blocks associated with the sublist is exceeded, using a second portion of the block ID larger than first portion to determine the sublist. 4. The method of claim 1 , further comprising: adding the sublist to the map fragment. 5. The method of claim 1 further comprising: associating a drive address of the map fragment written to the persistent storage to the filter of the sublist, wherein the filter is a Bloom filter. 6. The method of claim 1 wherein the search data structure is a binary retrieval tree. 7. The method of claim 3 further comprising: maintaining the written map fragment on the persistent storage for a period of time. 8. The method of claim 7 , wherein the period of time is until garbage collection or recycling. 9. The method of claim 3 further comprising: copying the mapping from the map fragment to a new map fragment associated with the sublist determined from the second portion of the block ID; writing the new map fragment to the persistent storage; and garbage collecting the written map fragment. 10. The method of claim 3 wherein the map fragment and filter are written to a circular log and wherein a storage space of map fragment on the persistent storage is reclaimed when the log is lapped. 11. A system comprising: a storage node coupled to one or more storage devices; a memory coupled to a processor of the storage node executing a storage service configured to: determine a block identifier (block ID) based on data of a block to be written to the persistent storage; determine a sublist based on a first portion of the block ID; write the block of data to the persistent storage at a physical block address of the persistent storage; add a mapping of the block identifier to a map fragment associated with the sublist, wherein the mapping maps the block identifier to the physical block address; update a filter associated with the map fragment and the block ID, the filter and the map fragment indexed by the sublist into a search data structure; determine whether the map fragment is full; in response to determining that the map fragment is full, write the map fragment and filter to the persistent storage; and in response to decreasing a persistent storage capacity of the storage service, using a second portion of the block ID smaller than the first portion to determine the sublist. 12. The system of claim 11 wherein the filter is a Bloom filter; the data block is written to a segment of the persistent storage; and the mapping fragment and the Bloom filter are written to the segment of the persistent storage. 13. The system of claim 11 wherein the storage service is further configured to: determine whether a predetermined number of data blocks associated with the sublist is exceeded; and in response to determining that the predetermined number of data blocks associated with the sublist is exceeded, use a second portion of the block ID larger than first portion to determine the sublist. 14. The system of claim 11 wherein the storage service is further configured to add the sublist to the map fragment. 15. The system of claim 11 wherein the storage service is further configured to: associate a drive address of the map fragment written to the persistent storage to the filter of the sublist, wherein the filter is a Bloom filter. 16. The system of claim 11 wherein the search data structure is a binary retrieval tree. 17. The system of claim 13 , wherein the storage service is further configured to maintain the written map fragment on the persistent storage for a period of time. 18. The system of claim 17 , wherein the period of time is until garbage collection or recycling. 19. The system of claim 13 wherein the storage service is further configured to: copy the mapping from the map fragment to a new map fragment associated with the sublist determined from the second portion of the block ID; write the new map fragment to the persistent storage; and garbage collecting the written map fragment. 20. A non-transitory computer readable medium containing executable program instructions for execution by a storage service having persistent storage, comprising: determining a block identifier (block ID) based on data of a block to be written to the persistent storage; determining a sublist based on a first portion of the block ID; writing the block of data to the persistent storage at a physical block address of the persistent storage; adding a mapping of the block identifier to a map fragment associated with the sublist, wherein the mapping maps the block identifier to the physical block address; updating a filter associated with the map fragment and the block ID, the filter and the map fragment indexed by the sublist into a search data structure; determining whether the map fragment is full; in response to determining that the map fragment is full, writing the map fragment and filter to the persistent storage; and in response to decreasing a persistent storage capacity of the storage service, using a second portion of the block ID smaller than the first portion to determine the sublist.

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

  • Solving problems relating to consistency · CPC title

  • management of metadata or control data · CPC title

  • Space efficiency improvement · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11157372B2 cover?
A technique is configured to reduce an amount of memory (i.e., memory footprint) usage by each storage node of a cluster needed to store metadata while providing fast and efficient servicing of data in accordance with storage requests issued by a client of the cluster. Illustratively, a block identifier (ID) is used to identify a block of data serviced by the storage node. Metadata embodied as …
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/1471. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).