File immutability using a deduplication file system in a public cloud using new filesystem redirection
US-2024103978-A1 · Mar 28, 2024 · US
US9400796B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9400796-B2 |
| Application number | US-40777409-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 19, 2009 |
| Priority date | Sep 15, 2004 |
| Publication date | Jul 26, 2016 |
| Grant date | Jul 26, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.
Opening claim text (preview).
What is claimed is: 1. A method of searching a repository of binary uninterpretted data for a location of common data to an input data comprising: analyzing segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing step including searching an index of representation values of the repository data for matching representation values of the input in a time independent of a size of the repository and linear in a size of the input data; and analyzing the similar repository segment with respect to the input segment to determine their common data sections while utilizing at least some of the matching representation values for data alignment, in a time linear in a size of the input segment. 2. The method of claim 1 , wherein the index is stored in a memory faster than a memory storing the repository itself. 3. The method of claim 2 , wherein the searching involves only the faster memory. 4. The method of claim 1 , wherein the searching time is independent of repository size. 5. The method of claim 1 , wherein the representation values are such that the number of matches indicates a degree of similarity. 6. The method of claim 1 , wherein the similarity searching includes a threshold number of matching representation values for a declared similarity. 7. The method of claim 6 , wherein the threshold varies in response to a statistical analysis of prior results of the searching step. 8. The method of claim 6 , including a step of verifying the declared similarity. 9. The method of claim 1 , wherein the index includes a location within the repository of the similarity matched portion. 10. The method of claim 6 , including a step of acting upon a declared similarity by matching the similar data repository data. 11. The method of claim 1 , including a step of data compression. 12. The method of claim 1 , including a step of updating the repository and the index. 13. The method of claim 1 , further comprising specifying locations in the repository and input data of distinguishing characteristics corresponding to the matching representation values; defining data intervals in each of the repository and input data based on the specified locations; wherein the analyzing the similar repository segment further comprises performing a binary difference process on the defined intervals, wherein sliding windows of the difference process for each of the repository and input data are at least sometimes positioned in non-matching offsets, wherein the data intervals defined for the repository data are of a different size than the data interval defined for the input data, wherein the index is stored in a memory faster than a memory storing the repository itself, wherein the similarity searching includes a threshold number of matching representation values for a declared similarity, wherein the threshold varies in response to a statistical analysis of prior results of the searching step. 14. A method of searching a repository of binary uninterpretted data for a location of common data to an input data comprising: analyzing segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing step including searching an index of representation values of the repository data for matching representation values of the input data in a time independent of a size of the repository and linear in a size of the input data; specifying locations in the repository and input data of distinguishing characteristics corresponding to the matching representation values; and analyzing the similar repository segment with respect to the input segment to determine their common data sections while utilizing the specified locations for data alignment, in a time linear in a size of the input segment. 15. The method of claim 14 , further comprising defining data intervals in each of the repository and input data based on the specified locations. 16. The method of claim 15 , wherein the analyzing the similar repository segment further comprises performing a binary difference process on the defined intervals. 17. The method of claim 16 , wherein sliding windows of the difference process for each of the repository and input data are at least sometimes positioned in non-matching offsets. 18. The method of claim 15 , wherein the data intervals defined for the repository data are of a different size than the data interval defined for the input data. 19. The method of claim 18 , wherein the data intervals defined for one of the repository data and the input data is one byte, while the data intervals for the other of the repository data and the input data is multiple bytes. 20. The method of claim 18 , wherein the index is stored in a memory faster than a memory storing the repository itself, wherein the similarity searching includes a threshold number of matching representation values for a declared similarity, wherein the threshold varies in response to a statistical analysis of prior results of the searching step.
using compression, e.g. sparse files · CPC title
Query execution · CPC title
Recoverability · CPC title
Real-time · CPC title
Hash tables · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.