Method, device, and program product for managing data pattern

US11907188B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11907188-B2
Application numberUS-202117239950-A
CountryUS
Kind codeB2
Filing dateApr 26, 2021
Priority dateJan 20, 2021
Publication dateFeb 20, 2024
Grant dateFeb 20, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for managing data patterns involve: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; dividing the multiple collection devices into multiple groups based on clusters of the multiple sets of data patterns; and determining, based on sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group. Accordingly, data patterns that can be shared among multiple collection devices can be determined in a more accurate and effective manner, thereby facilitating the removal of duplicate data from the multiple collection devices.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for managing data patterns, including: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein the multiple collection devices are located in an edge network in an application environment, and wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; generating, based on the multiple sets of data patterns, multiple pattern features, wherein each one of the pattern features is generated for a respective set of data patterns in the multiple sets of data patterns, and wherein each pattern feature includes a number of occurrences of each individual data pattern in the respective set of data patterns; dividing the multiple collection devices into multiple groups based on the pattern features; determining, based on the numbers of occurrences of data patterns included in sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group; distributing the set of shared data patterns to an edge computing device in the edge network, wherein the edge computing device is connected to a target collection device in the multiple collection devices in the group; instructing the edge computing device to generate de-duplicated data of target data from the target collection device based on the set of shared data patterns, wherein the de-duplicated data is smaller than the target data; instructing the edge computing device to transmit the de-duplicated data to a server device that is used to process the target data; and whereby transmission of the de-duplicated data to the server device reduces overhead of storage resources involved in data storage by the server device. 2. The method according to claim 1 , wherein dividing the multiple collection devices into the multiple groups includes: converting respectively the multiple pattern features to multiple low-dimensional features, wherein dimensions of the multiple low-dimensional features are lower than those of the multiple pattern features; and determining the multiple groups based on clusters of the multiple low-dimensional features. 3. The method according to claim 1 , wherein determining the set of shared data patterns further includes: determining the set of shared data patterns based on an intersection of the sets of data patterns. 4. The method according to claim 1 , wherein acquiring the multiple sets of data patterns includes: acquiring a set of initial data patterns associated with the collection device; ranking the set of initial data patterns based on the numbers of occurrences of the set of initial data patterns in the data; and selecting the set of data patterns associated with the collection device based on the ranked set of initial data patterns. 5. The method according to claim 1 , wherein the server device is located in a core network separate from the edge network; and whereby transmission of the de-duplicated data to the server device further reduces bandwidth requirements between the edge network and the core network. 6. An electronic device, including: at least one processor; a volatile memory; and a memory coupled to the at least one processor, wherein the memory has instructions stored therein that, when executed by the at least one processor, cause the device to perform a method for managing data patterns, the method including: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein the multiple collection devices are located in an edge network in an application environment, and wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; generating, based on the multiple sets of data patterns, multiple pattern features, wherein each one of the pattern features is generated for a respective set of data patterns in the multiple sets of data patterns, and wherein each pattern feature includes a number of occurrences of each individual data pattern in the respective set of data patterns; dividing the multiple collection devices into multiple groups based on the pattern features; determining, based on the numbers of occurrences of data patterns included in sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group; distributing the set of shared data patterns to an edge computing device in the edge network, wherein the edge computing device is connected to a target collection device in the multiple collection devices in the group; instructing the edge computing device to generate de-duplicated data of target data from the target collection device based on the set of shared data patterns, wherein the de-duplicated data is smaller than the target data; instructing the edge computing device to transmit the de-duplicated data to a server device that is used to process the target data; and whereby transmission of the de-duplicated data to the server device reduces overhead of storage resources involved in data storage by the server device. 7. The device according to claim 6 , wherein dividing the multiple collection devices into the multiple groups includes: converting respectively the multiple pattern features to multiple low-dimensional features, wherein dimensions of the multiple low-dimensional features are lower than those of the multiple pattern features; and determining the multiple groups based on clusters of the multiple low-dimensional features. 8. The device according to claim 6 , wherein determining the set of shared data patterns further includes: determining the set of shared data patterns based on an intersection of the sets of data patterns. 9. The device according to claim 6 , wherein acquiring the multiple sets of data patterns includes: acquiring a set of initial data patterns associated with the collection device; ranking the set of initial data patterns based on the numbers of occurrences of the set of initial data patterns in the data; and selecting the set of data patterns associated with the collection device based on the ranked set of initial data patterns. 10. The electronic device according to claim 6 , wherein the server device is located in a core network separate from the edge network; and whereby transmission of the de-duplicated data to the server device further reduces bandwidth requirements between the edge network and the core network. 11. A computer program product having a non-transitory computer readable medium which stores a set of instructions to manage data patterns; the set of instructions, when carried out by computerized circuitry, causing the computerized circuitry to perform a method of: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein the multiple collection devices are located in an edge network in an application environment, and wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; generating, based on the multiple sets of data patterns, multiple pattern features, wherein each one of the pattern features is generated for a respective set of data patterns in the multiple sets of data patterns, and wherein each pattern feature includes a number of occurrences of each individual data pattern in the respective set of data patterns; dividing the multipl

Assignees

Inventors

Classifications

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • G06F18/23Primary

    Clustering techniques · CPC title

  • Approximate or statistical queries · CPC title

  • based on discrimination criteria, e.g. discriminant analysis · CPC title

  • based on approximation criteria, e.g. principal component analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11907188B2 cover?
Techniques for managing data patterns involve: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; dividing the multiple collection devices into multiple groups based on clusters of the multi…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).