Storage node data placement utilizing similarity

US12216903B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12216903-B2
Application numberUS-202117407813-A
CountryUS
Kind codeB2
Filing dateAug 20, 2021
Priority dateOct 31, 2016
Publication dateFeb 4, 2025
Grant dateFeb 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods of deduplication aware scalable content placement are described. A method may include receiving data to be stored on one or more nodes of a storage array and calculating a plurality of hashes corresponding to the data. The method further includes determining a first subset of the plurality of hashes, determining a second subset of the plurality of hashes of the first subset, and generating a node candidate placement list. The method may further include sending the first subset to one or more nodes represented on the node candidate placement list and receiving, from the nodes represented on the node candidate placement list, characteristics corresponding to the nodes represented on the candidate placement list. The method may further include identifying one of the one or more nodes represented on the candidate placement list in view of the characteristic and sending the data to the identified node.

First claim

Opening claim text (preview).

What is claimed is: 1. A storage system comprising: a plurality of nodes comprising a plurality of solid state storage devices; and a storage array controller operatively coupled to the plurality of nodes, the storage array controller comprising a processing device configured to: calculate a plurality of hashes corresponding to data to be stored on the plurality of nodes of the storage system; identify a subset of the plurality of nodes storing data that is similar to the data to be stored while avoiding comparing the plurality of hashes to all hash values of the storage system; transmit the plurality of hashes to the subset of the plurality of nodes; receive, from the subset of nodes, results of a calculation to determine a similarity of the plurality of hashes with respective hashes representing data stored at one or more of the plurality of solid state storage devices on the subset of nodes; identify a node of the subset of nodes based on a result of the calculation for the node; and transmit the data to the identified node. 2. The system of claim 1 , wherein the plurality of hashes is generated by a rolling hash algorithm. 3. The system of claim 1 , wherein the calculation considers available capacity of the node and throughput of the node. 4. The system of claim 1 , wherein the plurality of solid state storage devices have differing capacities. 5. The system of claim 1 , wherein the calculation considers a load on the one of the plurality of solid state storage devices. 6. The system of claim 1 , wherein the processing device is further configured to: determine a subset of the plurality of solid state storage devices to perform the calculation in view of a predetermined number of low order bits corresponding to the plurality of hashes. 7. A method comprising: calculating, by a processing device of a storage array controller, a plurality of hashes corresponding to data to be stored on a plurality of nodes of a storage system, the plurality of nodes comprising a plurality of solid state storage devices; identify a subset of the plurality of nodes storing data that is similar to the data to be stored while avoiding comparing the plurality of hashes to all hash values of the storage system; transmitting the plurality of hashes to the subset of the plurality of nodes; receiving, from the subset of nodes, results of a calculation to determine a similarity of the plurality of hashes with respective hashes representing data stored at one or more of the plurality of solid state storage devices on the subset of nodes; identifying, by the processing device, a node of the subset of nodes based on a result of the calculation for the node; and transmitting the data to the identified node. 8. The method of claim 7 , wherein at least two of the solid state storage devices of a node are non-uniform with respect to capacity. 9. The method of claim 7 , wherein to calculate the plurality of hashes corresponding to the data, the method further comprises utilizing a rolling hash algorithm on the data. 10. The method of claim 7 , wherein the calculation considers available capacity of the node and throughput of the node. 11. The method of claim 7 , the method further comprising: sending the data to a second node responsive to receiving the result of the calculation from a first solid state storage device. 12. The method of claim 7 , wherein the plurality of solid state storage devices have differing capacities. 13. The method of claim 7 , wherein the calculation considers a load on the of the plurality of nodes. 14. The method of claim 7 , further comprising determining a subset of the plurality of solid state storage devices to perform the calculation in view of a predetermined number of low order bits corresponding to the plurality of hashes. 15. A non-transitory computer readable storage medium storing instructions, which when executed, cause a processing device of a storage array controller to: calculate, by the storage array controller, a plurality of hashes corresponding to data to be stored on a plurality of nodes of a storage system, the plurality of nodes comprising a plurality of solid state storage devices; identify a subset of the plurality of nodes storing data that is similar to the data to be stored while avoiding comparing the plurality of hashes to all hash values of the storage system; transmit the plurality of hashes to the subset of the plurality of nodes; receive, from the subset of nodes, results of a calculation to determine a similarity of the plurality of hashes with respective hashes representing data stored at one or more of the plurality of solid state storage devices on the subset of nodes; identify a node of the subset of nodes based on a result of the calculation for the node; and transmit the data to the identified node. 16. The non-transitory computer readable storage medium of claim 15 , wherein the plurality of solid state storage devices comprise storage devices having differing capacities. 17. The non-transitory computer readable storage medium of claim 15 , wherein to calculate the plurality of hashes corresponding to the data, the processing device is to compute a rolling hash of the data. 18. The non-transitory computer readable storage medium of claim 15 , wherein the calculation considers available capacity of the node. 19. The non-transitory computer readable storage medium of claim 15 , wherein the calculation considers throughput of the node. 20. The non-transitory computer readable storage medium of claim 15 , wherein the processing device is further to: determine a subset of the plurality of solid state storage devices to perform the calculation in view of a predetermined number of low order bits corresponding to the plurality of hashes.

Assignees

Inventors

Classifications

  • Non-volatile semiconductor memory arrays · CPC title

  • Migration mechanisms · CPC title

  • Disk arrays, e.g. RAID, JBOD · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • De-duplication techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12216903B2 cover?
Systems and methods of deduplication aware scalable content placement are described. A method may include receiving data to be stored on one or more nodes of a storage array and calculating a plurality of hashes corresponding to the data. The method further includes determining a first subset of the plurality of hashes, determining a second subset of the plurality of hashes of the first subset,…
Who is the assignee on this patent?
Pure Storage Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).