System and method for automatically suggesting rules for data stored in a table

US10332010B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10332010-B2
Application numberUS-201313770666-A
CountryUS
Kind codeB2
Filing dateFeb 19, 2013
Priority dateFeb 19, 2013
Publication dateJun 25, 2019
Grant dateJun 25, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system are presented of automatically suggesting rules for data stored in a table, with the table comprising a plurality of columns. The table is profiled to identify a content type for each of one or more of the plurality of columns. A rule knowledge base is accessed to locate rules specified for identified content types. Then, one or more of the located rules specified for identified content types are presented as suggestions. Acceptance of one or more of the suggested rules is received from a user, and the received validations are stored in the rule knowledge base. The accepted rules are applied to data for quality detection and monitoring. Embodiments are also described where columns are suggested based on a given rule.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of automatically suggesting rules for data stored in a table, the table comprising a plurality of columns, the method comprising: for each of one or more of the columns that do not have a content type explicitly provided in the table, identifying, using at least one hardware processor, a content type for the column by inferring the content type for the column based on an analysis of a format of data in the column and statistics generated by applying a function to data in the column, the content type being a meaning of data values stored in a particular column; accessing a rule knowledge database to locate rules specified for identified content types; generating a graphical user interface, the graphical user interface containing a series of rows and columns, each row of the graphical user interface corresponding to a different content type identified by or inferred for the table; rendering a selectable button in a first column of the graphical user interface; in response to detection of a user selection of the selectable button, rendering the suggested rules for the identified content types in the first column; receiving acceptance of one or more of the suggested rules from a user via the graphical user interface; applying accepted rules for data quality detection and monitoring; storing the received acceptance in the rule knowledge database; and applying the user acceptance knowledge in future rule suggestion. 2. The method of claim 1 , wherein the accessing includes examining content type and rule relationships stored in the rule knowledge database and using the user acceptance history for rule suggestions in the identification of appropriate rules. 3. The method of claim 1 , wherein the rules are data validation and cleansing rules. 4. The method of claim 1 , wherein the table is located in an Enterprise Resource Planning (ERP) system. 5. The method of claim 1 , further comprising: for a column whose identified content type does not yield any specified rules during the accessing, prompting the user to create a rule for the column and storing the created rule in the rule knowledge database. 6. The method of claim 1 , wherein the rule knowledge database includes a data structure containing a plurality of content types and, for one or more of the plurality of content types, one or more rules specified for the one or more of the plurality of content types. 7. The method of claim 1 , wherein each rule comprises a rule data structure identifying the rule and one or more rule parameter data structures, the rule data structure having a link to the one or more rule parameter data structures. 8. The method of claim 1 , wherein the accessing includes running an auto rule suggestion agent to search for rules having the same content types as the identified content types for the columns. 9. The method of claim 1 , wherein the presenting includes displaying to the user columns with identified column types associated with suggested rules. 10. The method of claim 1 , herein the content type is an address content type. 11. A method of automatically suggesting columns within a table, the method comprising: for each of one or more of the columns that do not have a content type explicitly provided in the table, identifying a content type for the column by inferring the content type for the column based on an analysis of a format of data in the column and statistics generated by applying a function to data in the column, the content type being a meaning of data values stored in a particular column; accessing a rule knowledge database to locate content types specified for one or more identified rules; generating a graphical user interface, the graphical user interface containing a series of rows and columns, each row of the graphical user interface corresponding to a different content type identified by or inferred for the table; rendering a selectable button in a first column of the graphical user interface; in response to detection of a user selection of the selectable button, rendering the suggested rules for the identified content types in the first column; receiving acceptance of one or more of the suggested one or more columns from a user; applying accepted rules for data quality detection and monitoring; storing the received acceptance in the rule knowledge database; and applying the user acceptance knowledge in future rule suggestion. 12. The method of claim 11 , wherein the rules are data validation and cleansing rules. 13. An apparatus comprising: a processor; a volatile memory including a staging area, the staging area storing data stored in a table, the table comprising a plurality of columns; a non-volatile memory including a rule knowledge database, the rule knowledge database including a data structure containing a plurality of content types and, for one or more of the plurality of content types, one or more rules specified for the one or more of the plurality of content types; a content type identification profiler configured to, for each of one or more of the columns that do not have a content type explicitly provided in the table, identify a content type for the column by interring the content type for the column based on an analysis of a format of data in the column and generated by applying a function to statistics data in the column, the content type being a meaning of data values stored in a particular column; a rule suggestion engine configured to access the rule knowledge database to locate rules specified for identified content types; a graphical user interface configured to: render a series of rows and columns, each row of the graphical user interface corresponding to a different content type identified by or inferred for the table; render a selectable button in a first column of the graphical user interface; in response to detection of a user selection of the selectable button, render the suggested rules for the identified content types in the first column; receive acceptance of one or more of the suggested rules from a user; and store the received acceptance in the rule knowledge database. 14. The apparatus of claim 13 , wherein one or more of the validated rules are data validation rules. 15. The apparatus of claim 14 , wherein the user interface is further configured to send the data validation rules to a component that detects and monitors input data based on the data validation rules. 16. The apparatus of claim 13 , wherein one or more of the validated rules are cleansing rules. 17. The apparatus of claim 16 , wherein the user interface is further configured to send the cleansing rules to a component that modifies the data based on the corrective rule. 18. The apparatus of claim 13 , wherein the staging area is included within an Enterprise Information Management (EIM) tool. 19. The apparatus of claim 13 , wherein the staging area is outside of an Enterprise Resource Planning (ERP) system. 20. An apparatus comprising: a processor; a volatile memory including a staging area, the staging area storing data stored in a table, the table comprising a plurality of columns; a non-volatile memory including a rule knowledge database, the rule knowledge database including a data structure containing a plurality of content types and, for one or more of the plurality of content types, one or more rules specified for the one or more of the plurality of content types; a content type identification profiler configured to, for each of one or more of t

Assignees

Inventors

Classifications

  • G06N5/025Primary

    Extracting rules from data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10332010B2 cover?
A method and system are presented of automatically suggesting rules for data stored in a table, with the table comprising a plurality of columns. The table is profiled to identify a content type for each of one or more of the plurality of columns. A rule knowledge base is accessed to locate rules specified for identified content types. Then, one or more of the located rules specified for identi…
Who is the assignee on this patent?
Yan Nancy, He Min, Kung David, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06N5/025. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 25 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).