Duplicate management

US9535624B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9535624-B1
Application numberUS-19712605-A
CountryUS
Kind codeB1
Filing dateAug 4, 2005
Priority dateSep 13, 2004
Publication dateJan 3, 2017
Grant dateJan 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of managing duplicate segments from a segmented file storage system is disclosed. The method comprises indexing a segment according to a key for the segments wherein the index includes an identification of a first data location where the segment is stored and identifying a duplicate segment having the same key that is stored in a second location. The method further comprises determining that the duplicate segment is an undesired duplicate segment and eliminating the undesired duplicate segment.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of managing duplicate segments from a segmented file storage system including: indexing each of a plurality of segments with a correspondingly associated key, the index including an identification of a first data location where a first segment is stored; identifying a duplicate segment associated with a duplicate key matching a first key associated with the first segment and stored in a second data location; determining whether the duplicate segment is an undesired duplicate segment based at least in part on an amount of time that has passed since an operation associated with the duplicate segment; eliminating the duplicate segment if the duplicate segment is determined to be an undesired duplicate segment, wherein the duplicate segment is an undesired duplicate segment in the event the amount of time has passed since the operation associated with the duplicate segment; and retaining the duplicate segment if the duplicate segment is not determined to be an undesired duplicate segment, wherein the duplicate segment is not an undesired duplicate segment in the event the amount of time has not passed since the operation associated with the duplicate segment. 2. A method as in claim 1 , wherein the key is a hash function of content of the segment. 3. A method as in claim 1 , wherein a duplicate segment is an undesired duplicate segment if the segment is not one of a predetermined number of most recently stored segments. 4. A method as in claim 1 , wherein a duplicate segment is an undesired duplicate segment if the segment was not accessed more recently than a predetermined time. 5. A method as in claim 1 , wherein a duplicate segment is an undesired duplicate segment if the segment was not stored more recently than a predetermined time. 6. A method as in claim 1 , wherein the identification of the first data location is a container ID. 7. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes noting segments to eliminate. 8. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes noting segments to keep. 9. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not determined undesired duplicate segments. 10. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not identified undesired duplicates and wherein the live instance summary vector gives a probabilistic indication that segments are not undesired duplicate segments. 11. A method as in claim 9 , wherein determining that the duplicate segment is an undesired duplicate segment includes creating a live summary vector of segments that are referenced and comparing segments to the live summary vector and the live instance summary vector. 12. A method as in claim 1 , further including repacking data containers having eliminated segments. 13. A method as in claim 1 , wherein the operation associated with the duplicate segment includes: storing the duplicate segment and accessing the duplicate segment. 14. A method of managing duplicate segments from a segmented file storage system including: identifying a first segment that is stored in a first data location; identifying a second segment as being a duplicate of the first segment that is stored in a second data location; determining whether the second segment is an undesired duplicate segment based at least in part on an amount of time that has passed since an operation associated with the second segment; eliminating the second segment if the second segment is determined to be an undesired duplicate segment, wherein the second segment is an undesired segment in the event the amount of time has passed since the operation associated with the second segment; and retaining the second segment if the second segment is not determined to be an undesired duplicate segment, wherein the second segment is not an undesired duplicate segment in the event the amount of time has not passed since the operation associated with the second segment, wherein identifying the second segment as being a duplicate of the first segment includes loading and analyzing indexes for data segments. 15. A method as in claim 14 , wherein identifying a second segment as being a duplicate of the first segment uses a summary vector. 16. A method as in claim 14 , wherein a second segment is an undesired duplicate segment if the segment is not one of a predetermined number of most recently stored segments. 17. A method as in claim 14 , wherein a second segment is an undesired duplicate segment if the segment was not stored more recently than a predetermined time. 18. A method as in claim 14 , wherein a second segment is an undesired duplicate segment if the segment was not accessed more recently than a predetermined time. 19. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes noting segments to eliminate. 20. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes noting segments to keep. 21. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not determined undesired duplicate segments. 22. A method as in claim 21 , wherein determining that the second segment is an undesired duplicate segment includes creating a live summary vector of segments that are referenced and comparing segments to the live summary vector and the live instance summary vector. 23. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not identified undesired duplicates and wherein the live instance summary vector gives a probabilistic indication that segments are not undesired duplicate segments. 24. A method as in claim 14 , further including repacking data containers having eliminated segments. 25. A computer program product for managing duplicate segments from a segmented file storage system, the computer program product being embodied in a non-transitory computer readable medium and comprising computer instructions for: indexing each of a plurality of segments with a correspondingly associated key, the index including an identification of a first data location where a first segment is; identifying a duplicate segment associated with a duplicate key matching a first key associated with the first segment and stored in a second data location; determining whether the duplicate segment is an undesired duplicate segment based at least in part on an amount of time that has passed since an operation associated with the duplicate segment; eliminating the duplicate segment if the duplicate segment is determined to be an undesired duplicate segment, wherein the duplicate segment is an undesired duplicate segment in the event the amount of time has passed since the operation associated with the duplicate segment; and retaining the duplicate segment if the duplicate segment is not determined to be an undesired duplicate segment, wherein the duplicat

Assignees

Inventors

Classifications

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Saving storage space on storage systems · CPC title

  • Single storage device · CPC title

  • using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9535624B1 cover?
A method of managing duplicate segments from a segmented file storage system is disclosed. The method comprises indexing a segment according to a key for the segments wherein the index includes an identification of a first data location where the segment is stored and identifying a duplicate segment having the same key that is stored in a second location. The method further comprises determinin…
Who is the assignee on this patent?
Patterson R Hugo, Zhu Ming Benjamin, Lee Edward K, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).