Epsilon-closure for frequent pattern analysis

US11366821B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11366821-B2
Application numberUS-201816119960-A
CountryUS
Kind codeB2
Filing dateAug 31, 2018
Priority dateMay 25, 2018
Publication dateJun 21, 2022
Grant dateJun 21, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and devices supporting epsilon (ε)-closure for frequent pattern (FP) analysis are described. Some database systems may analyze data sets to determine FPs. In some cases, the FP set may include a large number of semi-redundant patterns, resulting in significant memory or processing overhead. To reduce the redundancy of these patterns, the database system may implement pre-configured or dynamic threshold occurrence differences (e.g., ε values) to test against related patterns. For example, the database system may calculate the difference between the data objects covered by a sub-pattern and a super-pattern (e.g., where the super-pattern includes all the same data attributes of the sub-pattern, plus one additional attribute). This difference may be compared to a corresponding ε value, and if the difference is less than the ε value, the database system may remove one of the patterns (e.g., the sub-pattern) from the set of valid FPs to limit redundancy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for reducing memory and processor resource overhead associated with frequent pattern (FP) analysis at a database system, comprising: receiving, at the database system, a data set for FP analysis, the data set comprising a plurality of data objects, wherein each of the plurality of data objects comprises a set of data attributes; performing, at the database system, an FP analysis procedure on the received data set, wherein the FP analysis procedure comprises: determining a set of data attribute patterns for the plurality of data objects of the data set, wherein each data attribute pattern of the set of data attribute patterns comprises one or more data attributes and a number of occurrences of the data attribute pattern in the data set; identifying a data attribute sub-pattern of a data attribute pattern from the set of data attribute patterns, the data attribute pattern comprising a first set of data attributes and the data attribute sub-pattern comprising a subset of the first set of data attributes; calculating a difference between a number of occurrences of the identified data attribute sub-pattern and the number of occurrences of the data attribute pattern; comparing the calculated difference to a threshold occurrence difference, the threshold occurrence difference defining a threshold value for a difference between numbers of occurrences for a sub-pattern and a pattern; and managing memory and processing resource usage at the database system by removing the identified data attribute sub-pattern from the set of data attribute patterns, the removing based at least in part on the comparing and the calculated difference being below the identified threshold occurrence difference; and transmitting an indication of the set of data attribute patterns resulting from the FP analysis procedure based at least in part on removing the identified data attribute sub-pattern from the set of data attribute patterns. 2. The method of claim 1 , further comprising: selecting a different threshold occurrence difference based at least in part on a number of data attributes in the data attribute pattern or the data attribute sub-pattern. 3. The method of claim 1 , further comprising: selecting a different threshold occurrence difference based at least in part on a category of data attributes within the data attribute pattern or based at least in part on a category of data attributes that differs between the data attribute pattern and the data attribute sub-pattern. 4. The method of claim 1 , further comprising: configuring the threshold occurrence difference. 5. The method of claim 1 , further comprising: constructing a condensed data structure based at least in part on the data set, wherein the FP analysis procedure is performed using the condensed data structure. 6. The method of claim 5 , wherein the condensed data structure includes a FP-tree including identified patterns and an attribute list including one or more data attributes contained in the data set and a support corresponding to each of the one or more data attributes in the attribute list. 7. The method of claim 1 , wherein each of the plurality of data objects corresponds to a user or a user device. 8. An apparatus for reducing memory and processor resource overhead associated with frequent pattern (FP) analysis at a database system, comprising: a processor, memory in electronic communication with the processor; and instructions stored in the memory and executable by the processor to cause the apparatus to: receive, at the database system, a data set for FP analysis, the data set comprising a plurality of data objects, wherein each of the plurality of data objects comprises a set of data attributes; perform, at the database system, an FP analysis procedure on the received data set, wherein the instructions executable by the processor to cause the apparatus to perform the FP analysis procedure comprise instructions executable by the processor to cause the apparatus to: determine a set of data attribute patterns for the plurality of data objects of the data set, wherein each data attribute pattern of the set of data attribute patterns comprises one or more data attributes and a number of occurrences of the data attribute pattern in the data set; identify a data attribute sub-pattern of a data attribute pattern from the set of data attribute patterns, the data attribute pattern comprising a first set of data attributes and the data attribute sub-pattern comprising a subset of the first set of data attributes; calculate a difference between a number of occurrences of the identified data attribute sub-pattern and the number of occurrences of the data attribute pattern; compare the calculated difference to a threshold occurrence difference, the threshold occurrence difference defining a threshold value for a difference between numbers of occurrences for a sub-pattern and a pattern; and manage memory and processing resource usage at the database system by removing the identified data attribute sub-pattern from the set of data attribute patterns, the removing based at least in part on the comparing and the calculated difference being below the identified threshold occurrence difference; and transmit an indication of the set of data attribute patterns resulting from the FP analysis procedure based at least in part on removing the identified data attribute sub-pattern from the set of data attribute patterns. 9. The apparatus of claim 8 , wherein the instructions are further executable by the processor to cause the apparatus to: select a different threshold occurrence difference based at least in part on a number of data attributes in the data attribute pattern or the data attribute sub-pattern. 10. The apparatus of claim 8 , wherein the instructions are further executable by the processor to cause the apparatus to: select a different threshold occurrence difference based at least in part on a category of data attributes within the data attribute pattern or based at least in part on a category of data attributes that differs between the data attribute pattern and the data attribute sub-pattern. 11. The apparatus of claim 8 , wherein the instructions are further executable by the processor to cause the apparatus to: configure the threshold occurrence difference. 12. The apparatus of claim 8 , wherein the instructions are further executable by the processor to cause the apparatus to: construct a condensed data structure based on the data set, wherein the FP analysis procedure is performed using the condensed data structure. 13. The apparatus of claim 12 , wherein the condensed data structure includes a FP-tree including identified patterns and an attribute list including one or more data attributes contained in the data set and a support corresponding to each of the one or more data attributes in the attribute list. 14. The apparatus of claim 8 , wherein each of the plurality of data objects corresponds to a user or a user device. 15. A non-transitory computer-readable medium storing code for reducing memory and processor resource overhead associated with frequent pattern (FP) analysis at a database system, the code comprising instructions executable by a processor to: receive, at the database system, a data set for FP analysis, the data set comprising a plurality of data objects, wherein each of the plurality of data objects comprises a set of data attributes; perform, at the database system, an FP analysis procedure on the received data set, wherein the instructions to perform the FP analysis procedure are executa

Assignees

Inventors

Classifications

  • Data mining · CPC title

  • Clustering or classification · CPC title

  • Trees · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11366821B2 cover?
Methods, systems, and devices supporting epsilon (ε)-closure for frequent pattern (FP) analysis are described. Some database systems may analyze data sets to determine FPs. In some cases, the FP set may include a large number of semi-redundant patterns, resulting in significant memory or processing overhead. To reduce the redundancy of these patterns, the database system may implement pre-confi…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2465. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).