Anomaly detection in deduplication pruning operations
US-2021073190-A1 · Mar 11, 2021 · US
US2022129426A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022129426-A1 |
| Application number | US-202017081700-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 27, 2020 |
| Priority date | Oct 27, 2020 |
| Publication date | Apr 28, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One example method includes collaborative deduplication. A deduplication engine implemented at a cloud level collaborates or coordinates with an extension engine of the deduplication at an edge node. This allows data ingested at a node to be collaboratively deduplicated prior to transfer to the cloud and after transfer to the cloud.
Opening claim text (preview).
What is claimed is: 1 . A method for collaboratively deduplicating data, the method comprising: receiving data from an edge device at an extension engine operating on an edge node; checking the data using a local catalog to determine which files in the data have been transmitted to a deduplication engine operating in a datacenter, wherein the local catalog includes metadata configured to determine that first files in the data that have been previously sent to the deduplication engine and that second files in the data have not been sent to the deduplication engine based on the local catalog; collaborating, by the extension engine and the deduplication engine identify third files from the second files that have been deduplicated; transmitting the third files to the deduplication engine; deduplicating, by the deduplication engine, the third files; and updating the local catalog such that the local catalog reflects that the third files have been deduplicated by the deduplication engine. 2 . The method of claim 1 , further comprising identifying the third files based on a global catalog accessible to the deduplication engine, wherein the global catalog associates data from the source with hashes of deduplicated files. 3 . The method of claim 2 , further comprising generating a list of the second files and transmitting the list to the deduplication engine. 4 . The method of claim 3 , further comprising determining the third files from the list and the global catalog. 5 . The method of claim 4 , further comprising instructing the extension engine to transmit the third files to the deduplication engine. 6 . The method of claim 1 , further comprising deduplicating the third files by chunking the files, comparing hashes of the chunks with hashes stored in the global catalog, and storing new chunks in storage of the cloud. 7 . The method of claim 1 , wherein checking the data using a local catalog includes deduplicating based on chunks having a larger size than chunks used by the deduplication engine. 8 . The method of claim 1 , wherein the deduplication engine receives a list from multiple extension mechanisms at multiple edge nodes and each extension mechanism identifies third files, further comprising deduplicating all of the third files. 9 . The method of claim 8 , further comprising updating each of the extension engines based on their corresponding lists. 10 . A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: receiving data from an edge device at an extension engine operating on an edge node; checking the data using a local catalog to determine which files in the data have been transmitted to a deduplication engine operating in a datacenter, wherein the local catalog includes metadata configured to determine that first files in the data that have been previously sent to the deduplication engine and that second files in the data have not been sent to the deduplication engine based on the local catalog; collaborating, by the extension engine and the deduplication engine identify third files from the second files that have been deduplicated; transmitting the third files to the deduplication engine; deduplicating, by the deduplication engine, the third files; and updating the local catalog such that the local catalog reflects that the third files have been deduplicated by the deduplication engine. 11 . The non-transitory storage medium of claim 1 , further comprising identifying the third files based on a global catalog accessible to the deduplication engine, wherein the global catalog associates data from the source with hashes of deduplicated files. 12 . The non-transitory storage medium of claim 2 , further comprising generating a list of the second files and transmitting the list to the deduplication engine. 13 . The non-transitory storage medium of claim 3 , further comprising determining the third files from the list and the global catalog. 14 . The non-transitory storage medium of claim 4 , further comprising instructing the extension engine to transmit the third files to the deduplication engine. 15 . The non-transitory storage medium of claim 1 , further comprising deduplicating the third files by chunking the files, comparing hashes of the chunks with hashes stored in the global catalog, and storing new chunks in storage of the cloud. 16 . The non-transitory storage medium of claim 1 , further comprising providing the deduplication engine with pointers to the first files and the second files that are not transmitted to the deduplication engine. 17 . The non-transitory storage medium of claim 1 , wherein the deduplication engine receives a list from multiple extension mechanisms at multiple edge nodes and each extension mechanism identifies third files, further comprising deduplicating all of the third files. 18 . The non-transitory storage medium of claim 8 , further comprising updating each of the extension engines based on their corresponding lists.
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Management specifically adapted to replicated file systems · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Updates performed during online database operations; commit processing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.