Reconstructing deduplicated data

US11561949B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11561949-B1
Application numberUS-202016936172-A
CountryUS
Kind codeB1
Filing dateJul 22, 2020
Priority dateDec 12, 2014
Publication dateJan 24, 2023
Grant dateJan 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for efficiently storing data in a storage system. A data storage subsystem includes multiple data storage locations on multiple storage devices in addition to at least one mapping table. A data storage controller determines whether data to store in the storage subsystem has one or more patterns of data intermingled with non-pattern data within an allocated block. Rather than store the one or more pattern on the storage devices, the controller stores information in a header on the storage devices. The information includes at least an offset for the first instance of a pattern, a pattern length, and an identification of the pattern. The data may be reconstructed for a corresponding read request from the information stored in the header.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by a storage controller coupled to a data storage device, the method comprising: identifying, by the storage controller, that data within the data storage device that is associated with a target location of a read request includes one or more instances of patterned data and one or more instances of non-patterned data, wherein the one or more instances of patterned data comprises a number of contiguous instances of a bit pattern, and the number of contiguous instances of the bit pattern is above a predefined threshold; identifying information describing the patterned data and one or more locations of the patterned data within the data; accessing, in the data storage device, stored non-patterned data associated with the target location; and constructing, based on the stored non-patterned data and the information describing the patterned data, read data responsive to the read request. 2. The method of claim 1 wherein the information describing the patterned data includes fewer than the number of contiguous instances of the bit pattern in the data. 3. The method of claim 1 wherein the information describing the patterned data includes at least the bit pattern and a size of a single instance of the bit pattern. 4. The method of claim 1 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes respective offsets for at least one repeating pattern in one or more instances of the patterned data. 5. The method of claim 4 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes respective offsets for the non-patterned data. 6. The method of claim 1 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes a stride of offsets for one or more instances of the patterned data. 7. The method of claim 6 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes a stride of offsets for the non-patterned data. 8. A storage system that includes a plurality of storage devices, the storage system including a computer processor and a computer memory, the computer memory including computer program instructions that, when executed by the computer processor, cause the storage system to carry out the steps of: identifying, by a storage controller, that data within a data storage device that is associated with a target location of a read request includes one or more instances of patterned data and one or more instances of non-patterned data, wherein the one or more instances of patterned data comprises a number of contiguous instances of a bit pattern, and the number of contiguous instances of the bit pattern is above a predefined threshold; identifying information describing the patterned data and one or more locations of the patterned data within the data; accessing, in the data storage device, stored non-patterned data associated with the target location; and constructing, based on the stored non-patterned data and the information describing the patterned data, read data responsive to the read request. 9. The storage system of claim 8 wherein the information describing the patterned data includes fewer than the number of contiguous instances of the bit pattern in the data. 10. The storage system of claim 8 wherein the information describing the patterned data includes at least the bit pattern and a size of a single instance of the bit pattern. 11. The storage system of claim 8 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes respective offsets for at least one repeating pattern in one or more instances of the patterned data. 12. The storage system of claim 11 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes respective offsets for the non-patterned data. 13. The storage system of claim 8 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes a stride of offsets for one or more instances of the patterned data. 14. The storage system of claim 13 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes a stride of offsets for the non-patterned data. 15. An apparatus for use in a storage system that includes a plurality of storage devices, the apparatus including a computer processor and a computer memory, the computer memory including computer program instructions that, when executed by the computer processor, cause the storage system to carry out the steps of: identifying, by a storage controller, that data within a data storage device that is associated with a target location of a read request includes one or more instances of patterned data and one or more instances of non-patterned data, wherein the 5 one or more instances of patterned data comprises a number of contiguous instances of a bit pattern, and the number of contiguous instances of the bit pattern is above a predefined threshold; identifying information describing the patterned data and one or more locations of the patterned data within the data; accessing, in the data storage device, stored non-patterned data associated with the target location; and constructing, based on the stored non-patterned data and the information describing the patterned data, read data responsive to the read request. 16. The apparatus of claim 15 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes respective offsets for at least one repeating pattern in one or more instances of the patterned data. 17. The apparatus of claim 15 wherein the information describing the patterned data and one or more locations of the patterned data within the data includes a stride of offsets for one or more instances of the patterned data.

Assignees

Inventors

Classifications

  • De-duplication techniques · CPC title

  • Vectors, bitmaps or matrices · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11561949B1 cover?
A system and method for efficiently storing data in a storage system. A data storage subsystem includes multiple data storage locations on multiple storage devices in addition to at least one mapping table. A data storage controller determines whether data to store in the storage subsystem has one or more patterns of data intermingled with non-pattern data within an allocated block. Rather than…
Who is the assignee on this patent?
Pure Storage Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2237. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).