Techniques for managing deduplication of data

US9436697B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9436697-B1
Application numberUS-201313736510-A
CountryUS
Kind codeB1
Filing dateJan 8, 2013
Priority dateJan 8, 2013
Publication dateSep 6, 2016
Grant dateSep 6, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for detecting advanced security threats may be realized as a method for detecting a security threat including generating a resource at a client, implementing the resource on the client, monitoring system behavior of the client having the resource implemented thereon, determining whether a security event involving the implemented resource has occurred based on the monitored system behavior, and generating a report when it has been determined that the security event has occurred.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for managing deduplication of data comprising: receiving, via a network, first data to be backed up; separating the first data to be backed up into segments; generating, using at least one computer processor, a fingerprint for each of the segments; sampling n-bits of the fingerprints; writing the sampled fingerprints to a plurality of hook tables arranged in a plurality of computing nodes, wherein the plurality of computing nodes respectively contain at least one of the plurality of hook tables, and wherein sizes of the n-bits of each of the sampled fingerprints are based on sizes of the hook tables to which the sampled fingerprints are written; receiving a lookup set of fingerprints corresponding to second data to be backed up; extracting a portion of the fingerprints corresponding to the second data; comparing the extracted portion of the fingerprints corresponding to the second data to entries of the plurality of hook tables to determine which of the plurality of hook tables has a highest number of matches; determining whether any of the fingerprints corresponding to the second data do not exist in memory based on the comparison using the respective extracted portions; filtering the fingerprints corresponding to the second data that are determined not to exist and transmitting remaining fingerprints corresponding to the second data to the computing node having the hook table with the highest number of matches so that a second comparison is made using the remaining fingerprints to determine which of the remaining fingerprints exist in the fingerprints generated from the first data; and backing up segments associated with the second data that do not exist in the first data. 2. The method for managing deduplication of data of claim 1 , further comprising: storing the generated fingerprints in a buffer. 3. The method for managing deduplication of data of claim 2 , further comprising: determining whether the buffer is full; and writing the fingerprints in the buffer to the memory when it is determined that the buffer is full. 4. The method for managing deduplication of data of claim 2 , further comprising: performing the sampling using the fingerprints stored in the buffer. 5. The method for managing deduplication of data of claim 4 , wherein each sampled fingerprint is a hook. 6. The method for managing deduplication of data of claim 4 , wherein the fingerprints are sampled at a predetermined rate. 7. The method for managing deduplication of data of claim 6 , wherein the predetermined rate is selected based on a size of the at least one of the plurality of hook tables. 8. The method for managing deduplication of data of claim 4 , further comprising: determining which of the fingerprints to distribute to each of the plurality of hook tables. 9. The method for managing deduplication of data of claim 8 , wherein each of the plurality of hook tables is arranged at a separate computing node within a clustered environment. 10. The method for managing deduplication of data of claim 9 , wherein determining the distribution of the fingerprints to each of the plurality of hook tables is based on a number of the separate computing nodes. 11. The method for managing deduplication of data of claim 8 , wherein determining the distribution of the fingerprints to each of the plurality of hook tables is based on a target deduplication capacity. 12. The method for managing deduplication of data of claim 8 , wherein determining the distribution of the fingerprints to each of the plurality of hook tables is based on a rate at which the fingerprints are sampled. 13. The method for managing deduplication of data of claim 8 , wherein determining the distribution of the fingerprints to each of the plurality of hook tables is based on a size of each of the plurality of hook tables. 14. The method for managing deduplication of data of claim 1 , wherein each of the plurality of hook tables are arranged at different computing nodes and the received lookup set of fingerprints is compared to the entries of the hook table at each computing node sequentially. 15. An article of manufacture for managing deduplication of data, the article of manufacture comprising: at least one non-transitory processor readable storage medium; and instructions stored on the at least one medium; wherein the instructions are configured to be readable from the at least one medium by at least one processor and thereby cause the at least one processor to operate so as to: receive first data to be backed up; separate the first data to be backed up into segments; generate a fingerprint for each of the segments; sample n-bits of the fingerprints; write the sampled fingerprints to a plurality of hook tables arranged in a plurality of computing nodes, wherein the plurality of computing nodes respectively contain at least one of the plurality of hook tables, and wherein sizes of the n-bits of each of the sampled fingerprints are based on sizes of the hook tables to which the sampled fingerprints are written: receive a lookup set of fingerprints corresponding to second data to be backed up; extract a portion of the fingerprints corresponding to the second data; compare the extracted portion of the fingerprints corresponding to the second data to entries of the plurality of hook tables to determine which of the plurality of hook tables has a highest number of matches; determine whether any of the fingerprints corresponding to the second data do not exist in memory based on the comparison using the respective extracted portions; filter the fingerprints corresponding to the second data that are determined not to exist and transmit remaining fingerprints corresponding to the second data to the computing node having the hook table with the highest number of matches so that a second comparison is made using the remaining fingerprints to determine which of the remaining fingerprints exist in the fingerprints generated from the first data; and back up the segments associated with the second data that do not exist in the first data. 16. A system for managing deduplication of data comprising: one or more processors communicatively coupled to a network; wherein the one or more processors are configured to: receive first data to be backed up; separate the first data to be backed up into segments; generate a fingerprint for each of the segments; sample n-bits of the fingerprints; write the sampled fingerprints to a plurality of hook tables arranged in a plurality of computing nodes, wherein the plurality of computing nodes respectively contain at least one of the plurality of hook tables, and wherein sizes of the n-bits of each of the sampled fingerprints are based on sizes of the hook tables to which the sampled fingerprints are written; receive a lookup set of fingerprints corresponding to second data to be backed up; extract a portion of the fingerprints corresponding to the second data; compare the extracted portion of the fingerprints corresponding to the second data to entries of the plurality of hook tables to determine which of the plurality of hook tables has a highest number of matches; determine whether any of the fingerprints corresponding to the second data do not exist in memory based on the comparison using the respective extracted portions; filter the fingerprints corresponding to the second data that are determined not to exist and transmit remaining fingerprints corresponding to the second data to the computing node having the hook table with the highest number of matches so

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9436697B1 cover?
Techniques for detecting advanced security threats may be realized as a method for detecting a security threat including generating a resource at a client, implementing the resource on the client, monitoring system behavior of the client having the resource implemented thereon, determining whether a security event involving the implemented resource has occurred based on the monitored system beh…
Who is the assignee on this patent?
Veritas Technologies Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).