Mining patterns in a high-dimensional sparse feature space
US-2019272339-A1 · Sep 5, 2019 · US
US11275768B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11275768-B2 |
| Application number | US-201816120067-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 31, 2018 |
| Priority date | May 25, 2018 |
| Publication date | Mar 15, 2022 |
| Grant date | Mar 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and devices supporting differential support for frequent pattern (FP) analysis are described. Some database systems may analyze data sets to determine FPs of data attributes within the data sets. However, if data distributions for different types of data attributes vary greatly, more frequent data attribute types may skew the FPs away from the less frequent types. To reduce the noise of common attributes while maintaining sensitivity to the less common attributes, the database system may implement multiple minimum support (e.g., frequency) thresholds. For example, the database system may adaptively categorize the different data attribute types into data categories based on their distributions and may dynamically determine support thresholds for the categories. Using different minimum support thresholds for different data categories allows the system to filter out data attribute patterns based on the distributions of the data attribute types in the pattern.
Opening claim text (preview).
What is claimed is: 1. A method for frequent pattern (FP) analysis at a database system, comprising: receiving, at the database system, a data set for FP analysis, the data set comprising a plurality of data objects, wherein each of the plurality of data objects comprises a set of data attributes; clustering data attribute types into a plurality of data categories according to one or more data distributions of each of the data attribute types, wherein the clustering comprises clustering the data attribute types satisfying a global minimum support threshold for the data set and refraining from clustering one or more data attribute types that occur less frequently in the data set than the global minimum support threshold; dynamically determining a plurality of minimum support thresholds for the plurality of data categories, wherein a first minimum support threshold of the plurality of minimum support thresholds is different from a second minimum support threshold of the plurality of minimum support thresholds; performing an FP analysis procedure on the received data set, wherein the FP analysis procedure comprises: determining a set of data attribute patterns for the plurality of data objects of the data set, wherein each data attribute pattern of the set of data attribute patterns comprises one or more data attributes and a number of occurrences of the data attribute pattern in the data set; removing a data attribute pattern from the set of data attribute patterns when the number of occurrences of the data attribute pattern is less than a dynamically determined minimum support threshold of the plurality of minimum support thresholds; and removing the data attribute pattern from the set of data attribute patterns when a number of occurrences of a data attribute sub-pattern of the data attribute pattern is less than the dynamically determined minimum support threshold, wherein the dynamically determined minimum support threshold corresponds to a data category associated with the data attribute pattern or the data attribute sub-pattern; receiving, at a user interface associated with the database system, a user input indicating a selection of a data attribute; and transmitting, in response to the user input, an indication of one or more data attribute patterns of the set of data attribute patterns that comprise the selected data attribute based at least in part on querying a local data structure resulting from the FP analysis procedure. 2. The method of claim 1 , further comprising: identifying all possible sub-patterns associated with a second data attribute pattern of the set of data attribute patterns; and evaluating a support threshold for each of the possible sub-patterns, wherein the support threshold is a smallest dynamically determined minimum support threshold of a set of dynamically determined minimum support thresholds corresponding to data categories to which data attributes in the possible sub-patterns belong. 3. The method of claim 1 , wherein one or more dynamically determined minimum support thresholds for the plurality of data categories is based at least in part on the user input, a data distribution of a data attribute type, or a combination thereof. 4. The method of claim 1 , wherein one or more dynamically determined minimum support thresholds for the plurality of data categories is based at least in part on an average frequency of data attributes for a data category, a standard deviation within the data category, a minimum frequency value for an attribute in the data category, a maximum frequency value for an attribute in the data category, a number of data objects in the data set, one or more user input values, an importance metric of one or more data attributes grouped within the data category, or a combination thereof. 5. The method of claim 1 , wherein the one or more data distributions of each of the data attribute types comprises a set of distributions characterizing frequency of data attributes per data object and frequency of data objects per data attribute. 6. The method of claim 1 , wherein the plurality of data objects comprises a plurality of users within associated with a tenant of the database system, and wherein the set of data attributes comprises activities performed by the plurality of users and characteristics associated with the tenant. 7. The method of claim 1 , wherein the FP analysis is performed by one or more data processing machines within the database system, the one or more data processing machines comprising database servers, application servers, server clusters, virtual machines, containers, or a combination thereof. 8. The method of claim 1 , wherein dynamically determining the plurality of minimum support thresholds for the plurality of data categories further comprises: tuning the plurality of minimum support thresholds to manage a variance in the one or more data distributions for the plurality of data categories; and performing one or more threshold determination algorithms to reduce noise from data with a high data distribution. 9. The method of claim 8 , wherein tuning the plurality of minimum support thresholds to manage the variance in the one or more data distributions for the plurality of data categories comprises: executing a hyper parameter tuning process. 10. The method of claim 1 , wherein performing the FP analysis procedure on the received data set further comprises: executing a condensed data structure for an FP mining process, wherein the condensed data structure comprises an FP-tree and a linked list. 11. An apparatus for frequent pattern (FP) analysis at a database system, comprising: a processor; memory in electronic communication with the processor; and instructions stored in the memory and executable by the processor to cause the apparatus to: receive, at the database system, a data set for FP analysis, the data set comprising a plurality of data objects, wherein each of the plurality of data objects comprises a set of data attributes; cluster data attribute types into a plurality of data categories according to one or more data distributions of each of the data attribute types, wherein the instructions to cluster the data attribute types are further executable by the processor to cause the apparatus to cluster the data attribute types satisfying a global minimum support threshold for the data set and refrain from clustering one or more data attribute types that occur less frequently in the data set than the global minimum support threshold; dynamically determine a plurality of minimum support thresholds for the plurality of data categories, wherein a first minimum support threshold of the plurality of minimum support thresholds is different from a second minimum support threshold of the plurality of minimum support thresholds; perform an FP analysis procedure on the received data set, wherein the instructions to perform the FP analysis procedure are executable by the processor to cause the apparatus to: determine a set of data attribute patterns for the plurality of data objects of the data set, wherein each data attribute pattern of the set of data attribute patterns comprises one or more data attributes and a number of occurrences of the data attribute pattern in the data set; remove a data attribute pattern from the set of data attribute patterns when the number of occurrences of the data attribute pattern is less than a dynamically determined minimum support threshold of the plurality of minimum support thresholds; and remove the data attribute pattern from the set of data attribute patterns when a number of occurrences of a data attribute sub-pattern of the data attribute pattern is less than the dynam
Knowledge representation; Symbolic representation · CPC title
Market modelling; Market analysis; Collecting market data · CPC title
Clustering; Classification · CPC title
Query processing support for facilitating data mining operations in structured databases · CPC title
Clustering or classification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.