Mining patterns in a dataset

US9858320B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9858320-B2
Application numberUS-201414524240-A
CountryUS
Kind codeB2
Filing dateOct 27, 2014
Priority dateNov 13, 2013
Publication dateJan 2, 2018
Grant dateJan 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Accessing data in a database includes receiving, from a first user, a first query for a dataset stored in a database. A first set of patterns is provided in the dataset. For each pattern in the first set of patterns, a significance value is provided in response to the received first query. A set of tags is provided for flagging a pattern of the first set of patterns, the set of tags indicating at least two data categories describing the pattern. Input information received from the first user indicates tags of at least a first subset of patterns of the first set of patterns, wherein each tag of the tags is selected from the set of tags. The significance values of the first subset of patterns are adjusted based on the tags.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, from a first user, a first query for a dataset stored in a database; responsive to receiving the first query, providing a first set of patterns in the dataset with first respective significance values; providing a set of tags for flagging a first pattern in the first set of patterns, the set of tags indicating at least two data categories describing the pattern; receiving, from the first user, input information indicating tags of at least a first subset of patterns of the first set of patterns, wherein the tags are selected from the set of tags; and adjusting the first significance values of the first subset of patterns based on the tags; receiving, from a second user, a second query for the dataset; providing a second set of patterns in the dataset with second respective significance values in response to the received second query; receiving, from the second user, input information indicating tags of at least a second subset of patterns of the second set of patterns; determining a number of identical patterns flagged with a same tag between the first and second subset; in response to a determination that the number of identical patterns is higher than a predefined similarity threshold value, assigning, to each pattern of the identical patterns and to both the first and second user, a common set of tag counters corresponding to the set of tags respectively, wherein the common set of tag counters is a combination of respective set of tag counters assigned separately to the first and second user; and adjusting the significance values of the identical patterns using the common tag counters. 2. The method of claim 1 , wherein the first pattern of the first subset of patterns is flagged with a first tag of the set of tags, wherein the adjusting of the first significance value of the first pattern comprises: assigning, to the first pattern and the first user, a set of tag counters corresponding to the set of tags respectively; initializing the values of the set of tag counters; incrementing the tag counter of the set of tag counters corresponding to the first tag; using the set of tag counters including the incremented tag counter to adjust the significance value of the first pattern. 3. The method of claim 2 , further comprising using a weighted sum of the set of tag counters to adjust the significance value of the first pattern. 4. The method of claim 2 , wherein the at least two categories comprise a noise, trivial and valid category patterns, wherein the adjusting of the significance value (s(r)) is performed using the formula: s′(r,u)=s(r)−w*(r(p,u)−0.5), where r(p,u)=(nn(p,u)+nt(p,u))/(nv(p,u)+nn(p,u)+nt(p,u)), where nv(p,u), nn(p,u), nt(p,u) are the tag counters corresponding to the valid, noise and trivial categories respectively, wherein w is a predefined weight value. 5. The method of claim 1 , further comprising receiving, from a third user, a third query for the dataset; providing a third set of patterns in the dataset with respective significance values in response to the received third query; receiving, from the third user, input information indicating tags of at least a third subset of patterns of the third set of patterns; determining a second number of identical patterns flagged with the same tag between the first, second and third subset; in response to a determination that the second number of identical patterns is higher than the predefined similarity threshold value, assigning, to each pattern of the identical patterns and to the first, second and third user, a common set of tag counters corresponding to the set of tags respectively, wherein the common set of tag counters is a combination of respective set of tag counters assigned separately to the first second user and third user; and adjusting the significance values of the second number of identical patterns using the common tag counters. 6. The method of claim 5 , further comprising: determining identical patterns between the first subset of patterns and non-flagged patterns of the second set of patterns; assigning the tag of each pattern of the first subset to its identical non-flagged pattern of the second set of patterns. 7. The method of claim 1 , further comprising: receiving from one of the first user and the second user the first query for the dataset; providing an updated set of patterns with respective updated significance values. 8. The method of claim 1 , further comprising: determining that a first pattern of the first subset of patterns is a parent pattern of a non-flagged second pattern of the first set of patterns; flagging the second pattern using the tag of the first pattern; adjusting the significance value of the second pattern based on the tag of the first pattern. 9. The method of claim 1 , further comprising: ranking the first set of patterns using the respective significance values; storing the dataset based on the ranked patterns. 10. The method of claim 9 , further comprising storing data corresponding to the highest ranked patterns in an in-memory database. 11. A computer program product, comprising: a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions readable by a processor to cause the processor to perform a method comprising: receiving a first user a first query for a dataset stored in a database; providing a first set of patterns in the dataset, and providing for each pattern in the first set of patterns a first significance value in response to the received first query; providing a set of tags for flagging a first pattern in the first set of patterns, the set of tags indicating at least two data categories describing the pattern; receiving from the first user, input information indicating tags of at least a first subset of patterns of the first set of patterns, wherein the tags are selected from the set of tags; adjusting the first significance values of the first subset of patterns based on the tags; receiving, from a second user, a second query for the dataset; providing a second set of patterns in the dataset with respective significance values in response to the received second query; receiving, from the second user, input information indicating tags of at least a second subset of patterns of the second set of patterns; determining a number of identical patterns flagged with a same tag between the first and second subset; in response to a determination that the number of identical patterns is higher than a predefined similarity threshold value, assigning, to each pattern of the identical patterns and to both the first and second user, a common set of tag counters corresponding to the set of tags respectively, wherein the common set of tag counters is a combination of respective set of tag counters assigned separately to the first and second user; and adjusting the significance values of the identical patterns using the common tag counters. 12. The computer program product of claim 11 , wherein the first pattern of the first subset of patterns is flagged with a first tag of the set of tags, wherein the adjusting of the significance value of the first pattern comprises: assigning, to the first pattern and the first user, a set of tag counters corresponding to the set of tags respectively; initializing the values of the set of tag counters; incrementing the tag counter of the set of tag counters corresponding to the first tag; using the set of tag counters including the incremented tag counter to adjust the significance val

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Extracting rules from data · CPC title

  • G06F16/213Primary

    with details for schema evolution support · CPC title

  • Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9858320B2 cover?
Accessing data in a database includes receiving, from a first user, a first query for a dataset stored in a database. A first set of patterns is provided in the dataset. For each pattern in the first set of patterns, a significance value is provided in response to the received first query. A set of tags is provided for flagging a pattern of the first set of patterns, the set of tags indicating …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30539. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).