Reducing data distribution inefficiencies

US11119656B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11119656-B2
Application numberUS-201916436482-A
CountryUS
Kind codeB2
Filing dateJun 10, 2019
Priority dateOct 31, 2016
Publication dateSep 14, 2021
Grant dateSep 14, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods of deduplication aware scalable content placement am described. A method may include receiving data to be stored on one or more nodes of a storage array and calculating a plurality of hashes corresponding to the data. The method further includes determining a first subset of the plurality of hashes, determining a second subset of the plurality of hashes of the first subset, and generating a node candidate placement list. The method may further include sending the first subset to one or more nodes represented on the node candidate placement list and receiving, from the nodes represented on the node candidate placement list, characteristics corresponding to the nodes represented on the candidate placement list. The method may further include identifying one of the one or more nodes represented on the candidate placement list m view of the characteristic and sending the data to the identified node.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a storage array comprising a plurality of solid state drives; and a storage controller coupled to one of the plurality of solid state drives, the storage controller comprising a processing device, the processing device to: calculate a plurality of hashes corresponding to data to be stored on one or more nodes of a storage array utilizing a hash algorithm on the data to be stored; generate, based on the plurality of hashes, a candidate placement list for the data, wherein the candidate placement list comprises drives of the plurality of solid state drives having data similar to the data to be stored; identify, by the processing device, at least one of the plurality of solid state drives represented on the candidate placement list having data similar to the data to be stored based on drive characteristics being received from the at least one of the plurality of solid state drives represented on the candidate list: and send the data to the identified at least one of the plurality of solid state drives. 2. The system of claim 1 , wherein the hash algorithm is a rolling hash algorithm. 3. The system of claim 1 , wherein the characteristics comprise a matching score corresponding to: a first subset of the plurality of hashes corresponding to the data to be stored; and data stored on one of the one or more solid state drives represented on the candidate placement list. 4. The system of claim 3 , wherein the characteristics further comprise at least one of a capacity score or a load score, associated with a corresponding one of the one or more solid state drives represented on the candidate placement list. 5. The system of claim 3 , wherein the processing device is to determine the first subset in view of a predetermined number of low order bits corresponding to the plurality of hashes. 6. A method comprising: calculating a plurality of hashes corresponding to data to be stored on one or more nodes of a storage array utilizing a hash algorithm on the data to be stored; generating, based on the plurality of hashes, a candidate placement list for the data, wherein the candidate placement list comprises nodes having data similar to the data to be stored; identifying, by a processing device, one of the one or more nodes represented on the candidate placement list having data, similar to the data to be stored based on node characteristics being received from the one or more nodes represented on the candidate placement list: and sending the data to the identified node. 7. The method of claim 6 , wherein the nodes of the storage array are solid state drives. 8. The method of claim 6 , wherein to calculate the plurality of hashes corresponding to the data, the method further comprises utilizing a rolling hash algorithm on the data. 9. The method of claim 6 , the method further comprising: sending the data to a second node responsive to receiving the characteristics from a first solid state drive represented on the candidate placement list, wherein the characteristics correspond to the second node. 10. The method of claim 6 , wherein the characteristics comprise a matching score corresponding to: a first subset of the plurality of hashes corresponding to the data to be stored and data stored on one of the one or more nodes represented on the candidate placement list. 11. The method of claim 10 , wherein the characteristics further comprise at least one of a capacity score or a load score, associated with a corresponding one of the one or more nodes represented on the candidate placement list. 12. The method of claim 10 , further comprising determining the first subset in view of a predetermined number of low order bits corresponding to the plurality of hashes. 13. A non-transitory computer readable storage medium storing instructions, which when executed, cause a processing device to: calculate a plurality of hashes corresponding to data to be stored on one or more nodes of a storage array utilizing a hash algorithm on the data to be stored; generate, based on the plurality of hashes, a candidate placement list for the data, wherein the candidate placement list comprises nodes having data similar to the data to be stored; identify, by the processing device, one of the one or more nodes represented on the candidate placement list having data similar to the data to be stored based on a node characteristic being received from one of the one or more nodes represented on the candidate placement list: and send the data to the identified node. 14. The non-transitory computer readable storage medium of claim 13 , wherein the nodes of the storage array are solid state drives. 15. The non-transitory computer readable storage medium of claim 13 , wherein to determine the plurality of hashes corresponding to the data, the processing device is to compute a rolling hash of the data. 16. The non-transitory computer readable storage medium of claim 13 , wherein the characteristics comprise a matching score corresponding to: a first subset of the plurality of hashes corresponding to the data to be stored and data stored on one of the one or more nodes represented on the candidate placement list. 17. The non-transitory computer readable storage medium of claim 16 , further comprising determining the first subset in view of a predetermined number of low order bits corresponding to the plurality of hashes.

Assignees

Inventors

Classifications

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • Migration mechanisms · CPC title

  • Disk arrays, e.g. RAID, JBOD · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • De-duplication techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11119656B2 cover?
Systems and methods of deduplication aware scalable content placement am described. A method may include receiving data to be stored on one or more nodes of a storage array and calculating a plurality of hashes corresponding to the data. The method further includes determining a first subset of the plurality of hashes, determining a second subset of the plurality of hashes of the first subset, …
Who is the assignee on this patent?
Pure Storage Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 14 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).