Visual fog
US-2020250003-A1 · Aug 6, 2020 · US
US11907188B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11907188-B2 |
| Application number | US-202117239950-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 26, 2021 |
| Priority date | Jan 20, 2021 |
| Publication date | Feb 20, 2024 |
| Grant date | Feb 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for managing data patterns involve: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; dividing the multiple collection devices into multiple groups based on clusters of the multiple sets of data patterns; and determining, based on sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group. Accordingly, data patterns that can be shared among multiple collection devices can be determined in a more accurate and effective manner, thereby facilitating the removal of duplicate data from the multiple collection devices.
Opening claim text (preview).
The invention claimed is: 1. A method for managing data patterns, including: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein the multiple collection devices are located in an edge network in an application environment, and wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; generating, based on the multiple sets of data patterns, multiple pattern features, wherein each one of the pattern features is generated for a respective set of data patterns in the multiple sets of data patterns, and wherein each pattern feature includes a number of occurrences of each individual data pattern in the respective set of data patterns; dividing the multiple collection devices into multiple groups based on the pattern features; determining, based on the numbers of occurrences of data patterns included in sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group; distributing the set of shared data patterns to an edge computing device in the edge network, wherein the edge computing device is connected to a target collection device in the multiple collection devices in the group; instructing the edge computing device to generate de-duplicated data of target data from the target collection device based on the set of shared data patterns, wherein the de-duplicated data is smaller than the target data; instructing the edge computing device to transmit the de-duplicated data to a server device that is used to process the target data; and whereby transmission of the de-duplicated data to the server device reduces overhead of storage resources involved in data storage by the server device. 2. The method according to claim 1 , wherein dividing the multiple collection devices into the multiple groups includes: converting respectively the multiple pattern features to multiple low-dimensional features, wherein dimensions of the multiple low-dimensional features are lower than those of the multiple pattern features; and determining the multiple groups based on clusters of the multiple low-dimensional features. 3. The method according to claim 1 , wherein determining the set of shared data patterns further includes: determining the set of shared data patterns based on an intersection of the sets of data patterns. 4. The method according to claim 1 , wherein acquiring the multiple sets of data patterns includes: acquiring a set of initial data patterns associated with the collection device; ranking the set of initial data patterns based on the numbers of occurrences of the set of initial data patterns in the data; and selecting the set of data patterns associated with the collection device based on the ranked set of initial data patterns. 5. The method according to claim 1 , wherein the server device is located in a core network separate from the edge network; and whereby transmission of the de-duplicated data to the server device further reduces bandwidth requirements between the edge network and the core network. 6. An electronic device, including: at least one processor; a volatile memory; and a memory coupled to the at least one processor, wherein the memory has instructions stored therein that, when executed by the at least one processor, cause the device to perform a method for managing data patterns, the method including: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein the multiple collection devices are located in an edge network in an application environment, and wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; generating, based on the multiple sets of data patterns, multiple pattern features, wherein each one of the pattern features is generated for a respective set of data patterns in the multiple sets of data patterns, and wherein each pattern feature includes a number of occurrences of each individual data pattern in the respective set of data patterns; dividing the multiple collection devices into multiple groups based on the pattern features; determining, based on the numbers of occurrences of data patterns included in sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group; distributing the set of shared data patterns to an edge computing device in the edge network, wherein the edge computing device is connected to a target collection device in the multiple collection devices in the group; instructing the edge computing device to generate de-duplicated data of target data from the target collection device based on the set of shared data patterns, wherein the de-duplicated data is smaller than the target data; instructing the edge computing device to transmit the de-duplicated data to a server device that is used to process the target data; and whereby transmission of the de-duplicated data to the server device reduces overhead of storage resources involved in data storage by the server device. 7. The device according to claim 6 , wherein dividing the multiple collection devices into the multiple groups includes: converting respectively the multiple pattern features to multiple low-dimensional features, wherein dimensions of the multiple low-dimensional features are lower than those of the multiple pattern features; and determining the multiple groups based on clusters of the multiple low-dimensional features. 8. The device according to claim 6 , wherein determining the set of shared data patterns further includes: determining the set of shared data patterns based on an intersection of the sets of data patterns. 9. The device according to claim 6 , wherein acquiring the multiple sets of data patterns includes: acquiring a set of initial data patterns associated with the collection device; ranking the set of initial data patterns based on the numbers of occurrences of the set of initial data patterns in the data; and selecting the set of data patterns associated with the collection device based on the ranked set of initial data patterns. 10. The electronic device according to claim 6 , wherein the server device is located in a core network separate from the edge network; and whereby transmission of the de-duplicated data to the server device further reduces bandwidth requirements between the edge network and the core network. 11. A computer program product having a non-transitory computer readable medium which stores a set of instructions to manage data patterns; the set of instructions, when carried out by computerized circuitry, causing the computerized circuitry to perform a method of: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein the multiple collection devices are located in an edge network in an application environment, and wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; generating, based on the multiple sets of data patterns, multiple pattern features, wherein each one of the pattern features is generated for a respective set of data patterns in the multiple sets of data patterns, and wherein each pattern feature includes a number of occurrences of each individual data pattern in the respective set of data patterns; dividing the multipl
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Clustering techniques · CPC title
Approximate or statistical queries · CPC title
based on discrimination criteria, e.g. discriminant analysis · CPC title
based on approximation criteria, e.g. principal component analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.