Distributed fp-growth with node table for large-scale association rule mining
US-2018107695-A1 · Apr 19, 2018 · US
US10963519B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10963519-B2 |
| Application number | US-201916355996-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 18, 2019 |
| Priority date | Mar 18, 2019 |
| Publication date | Mar 30, 2021 |
| Grant date | Mar 30, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data processing server may receive a set of data objects for frequent pattern (FP) analysis. The set of data objects may be analyzed using an attribute diversity technique. For the set of data attributes of the set of data objects, the server may arrange the attributes in one or more dimensions. The server may initialize a set of centroids on data points and identify mean values of nearby data points. Based on an iteration of the mean value calculation, the server may identify a set of attributes corresponding to final mean values as being groups of similarly frequent attributes. These groups of similarly frequent attributes may be analyzed using an FP analysis procedure to identify frequent patterns of data attributes.
Opening claim text (preview).
What is claimed is: 1. A method for data processing at a database system, comprising: receiving, at the database system, a plurality of data objects, each data object of the plurality of data objects associated with one or more data attributes; arranging the one or more data attributes along one or more dimensions; defining a plurality of data points for a set of the arranged one or more data attributes; initializing a plurality of centroids on a subset of the plurality of data points; identifying, for each centroid of the plurality of centroids, a mean value of one or more data points of the plurality of data points within a bandwidth of each centroid of the plurality of centroids to generate a set of mean values; iterating the identifying using the set of mean values as the plurality of centroids until satisfaction of a merging threshold by the set of mean values to generate a set of final mean values; identifying, for each final mean value of the set of final mean values, a set of data attributes corresponding to data points within a range of the final mean value; and performing a frequent pattern (FP) analysis procedure on each set of data attributes corresponding to each final mean value. 2. The method of claim 1 , wherein arranging the one or more data attributes further comprises: sorting the one or more data attributes associated with the plurality of data objects based on the number of occurrences of each data attribute in the plurality of data objects, wherein each data point of the plurality of data points correspond to the number of occurrences for each attribute associated with the plurality of data objects. 3. The method of claim 2 , wherein: selecting the subset of the plurality of data points for centroid initialization based on the bandwidth. 4. The method of claim 3 , further comprising: selecting, for a bandwidth value n, every nth data point corresponding to the number occurrences of each data attribute in the set of data attribute patterns for initialization of a centroid of the plurality of centroids. 5. The method of claim 1 , further comprising: initializing each centroid of the plurality of centroids on a data point of the plurality of data points. 6. The method of claim 1 , further comprising: removing a set of data attributes for a final mean value if a number of data attributes in the set of data attributes is less than a threshold. 7. The method of claim 1 , further comprising: calculating a real mean value for each mean value of the set of mean values based on each centroid, the bandwidth, and the one or more data points of the subset of the plurality of data points within the bandwidth of each centroid. 8. The method of claim 7 , further comprising: calculating the real mean value using a kernelized weighted average process. 9. The method of claim 1 , further comprising: selecting each mean value of the set of mean values as a nearest data point to a calculated real mean value based on each centroid, the bandwidth, and the one or more data points of the subset of the plurality of data points within the bandwidth of each centroid. 10. The method of claim 1 , further comprising: identifying the one or more data points of the subset of the plurality of data points within the bandwidth of each centroid using a Euclidean distance calculation of a distance between each of the one or more data points and each centroid. 11. The method of claim 1 , wherein the merging threshold is based on a delta between a previous mean value and a current mean value. 12. The method of claim 1 , wherein the plurality of data objects comprises a plurality of users within the database system and the set of data attributes comprises activities performed by the plurality of users or characteristics associated with the plurality of users. 13. The method of claim 1 , wherein the set data attribute patterns corresponds to frequently-occurring conjunctions of data attributes in a user population. 14. An apparatus for data processing at a database system, comprising: a processor, memory in electronic communication with the processor; and instructions stored in the memory and executable by the processor to cause the apparatus to: receive, at the database system, a plurality of data objects, each data object of the plurality of data objects associated with one or more data attributes; arrange the one or more data attributes along one or more dimensions; define a plurality of data points for a set of the arranged one or more data attributes; initialize a plurality of centroids on a subset of the plurality of data points; identify, for each centroid of the plurality of centroids, a mean value of one or more data points of the plurality of data points within a bandwidth of each centroid of the plurality of centroids to generate a set of mean values; iterate the identifying using the set of mean values as the plurality of centroids until satisfaction of a merging threshold by the set of mean values to generate a set of final mean values; identify, for each final mean value of the set of final mean values, a set of data attributes corresponding to data points within a range of the final mean value; and perform a frequent pattern (FP) analysis procedure on each set of data attributes corresponding to each final mean value. 15. The apparatus of claim 14 , wherein the instructions to arrange the one or more data attributes further are executable by the processor to cause the apparatus to: sort the one or more data attributes associated with the plurality of data objects based on the number of occurrences of each data attribute in the plurality of data objects, wherein each data point of the plurality of data points correspond to the number of occurrences for each attribute associated with the plurality of data objects. 16. The apparatus of claim 15 , wherein the instructions are further executable by the processor to cause the apparatus to: select, for a bandwidth value n, every nth data point corresponding to the number occurrences of each data attribute in the set of data attribute patterns for initialization of a centroid of the plurality of centroids. 17. The apparatus of claim 14 , wherein the set data attribute patterns corresponds to frequently-occurring conjunctions of data attributes in a user population. 18. A non-transitory computer-readable medium storing code for data processing at a database system, the code comprising instructions executable by a processor to: receive, at the database system, a plurality of data objects, each data object of the plurality of data objects associated with one or more data attributes; arrange the one or more data attributes along one or more dimensions; define a plurality of data points for a set of the arranged one or more data attributes; initialize a plurality of centroids on a subset of the plurality of data points; identify, for each centroid of the plurality of centroids, a mean value of one or more data points of the plurality of data points within a bandwidth of each centroid of the plurality of centroids to generate a set of mean values; iterate the identifying using the set of mean values as the plurality of centroids until satisfaction of a merging threshold by the set of mean values to generate a set of final mean values; identify, for each final mean value of the set of final mean values, a set of data attributes corresponding to data points within a range of the final mean value; and perform a frequent pattern (FP) analysis procedure on each set of data
Query processing support for facilitating data mining operations in structured databases · CPC title
Clustering; Classification · CPC title
using statistics or function optimisation, e.g. modelling of probability density functions · CPC title
Distances to cluster centroïds · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.