Method for efficiently storing data
US-2024370165-A1 · Nov 7, 2024 · US
US9535624B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9535624-B1 |
| Application number | US-19712605-A |
| Country | US |
| Kind code | B1 |
| Filing date | Aug 4, 2005 |
| Priority date | Sep 13, 2004 |
| Publication date | Jan 3, 2017 |
| Grant date | Jan 3, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of managing duplicate segments from a segmented file storage system is disclosed. The method comprises indexing a segment according to a key for the segments wherein the index includes an identification of a first data location where the segment is stored and identifying a duplicate segment having the same key that is stored in a second location. The method further comprises determining that the duplicate segment is an undesired duplicate segment and eliminating the undesired duplicate segment.
Opening claim text (preview).
What is claimed is: 1. A method of managing duplicate segments from a segmented file storage system including: indexing each of a plurality of segments with a correspondingly associated key, the index including an identification of a first data location where a first segment is stored; identifying a duplicate segment associated with a duplicate key matching a first key associated with the first segment and stored in a second data location; determining whether the duplicate segment is an undesired duplicate segment based at least in part on an amount of time that has passed since an operation associated with the duplicate segment; eliminating the duplicate segment if the duplicate segment is determined to be an undesired duplicate segment, wherein the duplicate segment is an undesired duplicate segment in the event the amount of time has passed since the operation associated with the duplicate segment; and retaining the duplicate segment if the duplicate segment is not determined to be an undesired duplicate segment, wherein the duplicate segment is not an undesired duplicate segment in the event the amount of time has not passed since the operation associated with the duplicate segment. 2. A method as in claim 1 , wherein the key is a hash function of content of the segment. 3. A method as in claim 1 , wherein a duplicate segment is an undesired duplicate segment if the segment is not one of a predetermined number of most recently stored segments. 4. A method as in claim 1 , wherein a duplicate segment is an undesired duplicate segment if the segment was not accessed more recently than a predetermined time. 5. A method as in claim 1 , wherein a duplicate segment is an undesired duplicate segment if the segment was not stored more recently than a predetermined time. 6. A method as in claim 1 , wherein the identification of the first data location is a container ID. 7. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes noting segments to eliminate. 8. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes noting segments to keep. 9. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not determined undesired duplicate segments. 10. A method as in claim 1 , wherein determining that the duplicate segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not identified undesired duplicates and wherein the live instance summary vector gives a probabilistic indication that segments are not undesired duplicate segments. 11. A method as in claim 9 , wherein determining that the duplicate segment is an undesired duplicate segment includes creating a live summary vector of segments that are referenced and comparing segments to the live summary vector and the live instance summary vector. 12. A method as in claim 1 , further including repacking data containers having eliminated segments. 13. A method as in claim 1 , wherein the operation associated with the duplicate segment includes: storing the duplicate segment and accessing the duplicate segment. 14. A method of managing duplicate segments from a segmented file storage system including: identifying a first segment that is stored in a first data location; identifying a second segment as being a duplicate of the first segment that is stored in a second data location; determining whether the second segment is an undesired duplicate segment based at least in part on an amount of time that has passed since an operation associated with the second segment; eliminating the second segment if the second segment is determined to be an undesired duplicate segment, wherein the second segment is an undesired segment in the event the amount of time has passed since the operation associated with the second segment; and retaining the second segment if the second segment is not determined to be an undesired duplicate segment, wherein the second segment is not an undesired duplicate segment in the event the amount of time has not passed since the operation associated with the second segment, wherein identifying the second segment as being a duplicate of the first segment includes loading and analyzing indexes for data segments. 15. A method as in claim 14 , wherein identifying a second segment as being a duplicate of the first segment uses a summary vector. 16. A method as in claim 14 , wherein a second segment is an undesired duplicate segment if the segment is not one of a predetermined number of most recently stored segments. 17. A method as in claim 14 , wherein a second segment is an undesired duplicate segment if the segment was not stored more recently than a predetermined time. 18. A method as in claim 14 , wherein a second segment is an undesired duplicate segment if the segment was not accessed more recently than a predetermined time. 19. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes noting segments to eliminate. 20. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes noting segments to keep. 21. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not determined undesired duplicate segments. 22. A method as in claim 21 , wherein determining that the second segment is an undesired duplicate segment includes creating a live summary vector of segments that are referenced and comparing segments to the live summary vector and the live instance summary vector. 23. A method as in claim 14 , wherein determining that the second segment is an undesired duplicate segment includes creating a live instance summary vector of segments that are not identified undesired duplicates and wherein the live instance summary vector gives a probabilistic indication that segments are not undesired duplicate segments. 24. A method as in claim 14 , further including repacking data containers having eliminated segments. 25. A computer program product for managing duplicate segments from a segmented file storage system, the computer program product being embodied in a non-transitory computer readable medium and comprising computer instructions for: indexing each of a plurality of segments with a correspondingly associated key, the index including an identification of a first data location where a first segment is; identifying a duplicate segment associated with a duplicate key matching a first key associated with the first segment and stored in a second data location; determining whether the duplicate segment is an undesired duplicate segment based at least in part on an amount of time that has passed since an operation associated with the duplicate segment; eliminating the duplicate segment if the duplicate segment is determined to be an undesired duplicate segment, wherein the duplicate segment is an undesired duplicate segment in the event the amount of time has passed since the operation associated with the duplicate segment; and retaining the duplicate segment if the duplicate segment is not determined to be an undesired duplicate segment, wherein the duplicat
De-duplication techniques · CPC title
Saving storage space on storage systems · CPC title
Single storage device · CPC title
using de-duplication of the data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.