Versatile data reduction for internet of things

US2022129426A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022129426-A1
Application numberUS-202017081700-A
CountryUS
Kind codeA1
Filing dateOct 27, 2020
Priority dateOct 27, 2020
Publication dateApr 28, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One example method includes collaborative deduplication. A deduplication engine implemented at a cloud level collaborates or coordinates with an extension engine of the deduplication at an edge node. This allows data ingested at a node to be collaboratively deduplicated prior to transfer to the cloud and after transfer to the cloud.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for collaboratively deduplicating data, the method comprising: receiving data from an edge device at an extension engine operating on an edge node; checking the data using a local catalog to determine which files in the data have been transmitted to a deduplication engine operating in a datacenter, wherein the local catalog includes metadata configured to determine that first files in the data that have been previously sent to the deduplication engine and that second files in the data have not been sent to the deduplication engine based on the local catalog; collaborating, by the extension engine and the deduplication engine identify third files from the second files that have been deduplicated; transmitting the third files to the deduplication engine; deduplicating, by the deduplication engine, the third files; and updating the local catalog such that the local catalog reflects that the third files have been deduplicated by the deduplication engine. 2 . The method of claim 1 , further comprising identifying the third files based on a global catalog accessible to the deduplication engine, wherein the global catalog associates data from the source with hashes of deduplicated files. 3 . The method of claim 2 , further comprising generating a list of the second files and transmitting the list to the deduplication engine. 4 . The method of claim 3 , further comprising determining the third files from the list and the global catalog. 5 . The method of claim 4 , further comprising instructing the extension engine to transmit the third files to the deduplication engine. 6 . The method of claim 1 , further comprising deduplicating the third files by chunking the files, comparing hashes of the chunks with hashes stored in the global catalog, and storing new chunks in storage of the cloud. 7 . The method of claim 1 , wherein checking the data using a local catalog includes deduplicating based on chunks having a larger size than chunks used by the deduplication engine. 8 . The method of claim 1 , wherein the deduplication engine receives a list from multiple extension mechanisms at multiple edge nodes and each extension mechanism identifies third files, further comprising deduplicating all of the third files. 9 . The method of claim 8 , further comprising updating each of the extension engines based on their corresponding lists. 10 . A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: receiving data from an edge device at an extension engine operating on an edge node; checking the data using a local catalog to determine which files in the data have been transmitted to a deduplication engine operating in a datacenter, wherein the local catalog includes metadata configured to determine that first files in the data that have been previously sent to the deduplication engine and that second files in the data have not been sent to the deduplication engine based on the local catalog; collaborating, by the extension engine and the deduplication engine identify third files from the second files that have been deduplicated; transmitting the third files to the deduplication engine; deduplicating, by the deduplication engine, the third files; and updating the local catalog such that the local catalog reflects that the third files have been deduplicated by the deduplication engine. 11 . The non-transitory storage medium of claim 1 , further comprising identifying the third files based on a global catalog accessible to the deduplication engine, wherein the global catalog associates data from the source with hashes of deduplicated files. 12 . The non-transitory storage medium of claim 2 , further comprising generating a list of the second files and transmitting the list to the deduplication engine. 13 . The non-transitory storage medium of claim 3 , further comprising determining the third files from the list and the global catalog. 14 . The non-transitory storage medium of claim 4 , further comprising instructing the extension engine to transmit the third files to the deduplication engine. 15 . The non-transitory storage medium of claim 1 , further comprising deduplicating the third files by chunking the files, comparing hashes of the chunks with hashes stored in the global catalog, and storing new chunks in storage of the cloud. 16 . The non-transitory storage medium of claim 1 , further comprising providing the deduplication engine with pointers to the first files and the second files that are not transmitted to the deduplication engine. 17 . The non-transitory storage medium of claim 1 , wherein the deduplication engine receives a list from multiple extension mechanisms at multiple edge nodes and each extension mechanism identifies third files, further comprising deduplicating all of the third files. 18 . The non-transitory storage medium of claim 8 , further comprising updating each of the extension engines based on their corresponding lists.

Assignees

Inventors

Classifications

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • Management specifically adapted to replicated file systems · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Updates performed during online database operations; commit processing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022129426A1 cover?
One example method includes collaborative deduplication. A deduplication engine implemented at a cloud level collaborates or coordinates with an extension engine of the deduplication at an edge node. This allows data ingested at a node to be collaboratively deduplicated prior to transfer to the cloud and after transfer to the cloud.
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/1844. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 28 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).