Discovering transformations applied to a source table to generate a target table

US9720971B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9720971-B2
Application numberUS-16554908-A
CountryUS
Kind codeB2
Filing dateJun 30, 2008
Priority dateJun 30, 2008
Publication dateAug 1, 2017
Grant dateAug 1, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tables to produce first category pre-processing output. The first category pre-processing output is used to determine first category transformation rules with respect to at least one source table column and at least one target table column. For each unpredicted target column in the target table not predicted by the determined first category transformation rules, a second pre-processing method is applied to columns in the source table and unpredicted target columns to produce second category pre-processing output. The second category pre-processing output is used to determine second category transformation rules with respect to at least one source table column and at least one target table column.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: selecting a source table in a computer readable storage medium comprising a plurality of rows and a target table in the computer readable storage medium resulting from a transformation applied to the rows of the source table; applying a first pre-processing method with respect to columns in the source and target tables to produce first category pre-processing output; using the first category pre-processing output to determine first category transformation rules with respect to at least one source table column and at least one target table column to predict values in the target table from values in the source table; determining at least one unpredicted target column in the target table comprising at least one target column determined not to be predicted by the first category transformation rules; for the at least one unpredicted target column, applying a second pre-processing method to the at least one unpredicted target column and the columns in the source table to produce second category pre-processing output; and using the second category pre-processing output to determine second category transformation rules with respect to the at least one unpredicted target column in the target table. 2. The method of claim 1 , further comprising: for the at least one unpredicted target column in the target table not predicted by any determined first and second category transformation rules, applying a third processing method to columns in the source table and the at least one unpredicted target column to produce third category pre-processing output; and using the third category pre-processing output to determine third category transformation rules with respect to at least one source table column and the at least one unpredicted target column in the target table. 3. The method of claim 2 , wherein the first category transformation rules comprise valued based transformations, wherein the second category transformation rules comprise aggregate transformations, and wherein the third category transformation rules comprise arithmetic transformations. 4. The method of claim 1 , wherein applying the first pre-processing method comprises applying first category tests to the source and target table columns to produce the first category pre-processing output, wherein the first category pre-processing output comprises first category test output. 5. The method of claim 4 , wherein using the first category test output comprises: processing, by a data mining engine, the first category pre-processing output to produce a data mining model defining patterns in source and target columns that occur together; and processing, by a rules post-processor, the data mining model to determine first transformation rules that produce the patterns in the data mining model. 6. The method of claim 4 , wherein the applying of the first pre-processing method comprises: joining the rows of the source and target tables to produce a joined table, wherein each row of the joined table includes the columns of the source and target tables; for rows of the joined table, outputting one row for columns in the joined table having an identifier of the row of the joined table and a name and value of the column in the joined table; performing the first category tests on the rows in the joined table; and for instances where one of the output rows passes one of the first category tests, generating one test output row identifying the row identifier for which a first category test passed and information identifying the passed first category test. 7. The method of claim 6 , further comprising: maintaining a counter for each performed first category test indicating a number of times the performed first category test failed with respect to the rows of the joined table to which the first category test is applied; and stopping application of the first category test whose counter exceeds a threshold value. 8. The method of claim 6 , further comprising: indicating a plurality of first category tests, wherein each first category test is performed on the rows of the joined table; initiating a counter for each of the first category tests indicating a number of times the performed first category test failed with respect to the rows of the joined table; and removing indication of the first category test whose counter exceeds a threshold value, wherein first category transformation rules are determined from the first category tests whose counters do not exceed the threshold values for the first category tests. 9. The method of claim 8 , wherein the first category tests are members of a set of tests to test equality between a source and target; test if the value of a source column contains the value of a target column; test if the value of a target column contains the value of a target column; and test if the value of a target column is equal to a result of a scalar function applied on one or more source columns. 10. The method of claim 1 , wherein the second pre-processing method, comprises: joining the rows of the source and target tables to produce a joined table having the at last one unpredicted target column comprising at least one unpredicted target numerical column from the target table, wherein applying the second pre-processing method comprises performing at least one function on rows in each column of the source table grouped by a key to produce at least one result column; applying functions to the source table columns to produce result columns; and determining whether each result column for the key matches one of at least one unpredicted target numerical column value for the key, wherein the second category pre-processing output comprises information on result columns from the source table columns that match the at least one unpredicted target numerical column. 11. The method of claim 10 , wherein the functions applied to the source table columns are a member of a set of aggregation functions comprising summing, minimum, maximum, average, variance, and standard deviation. 12. The method of claim 10 , wherein the second category pre-processing output indicates a minimum specified percentage of time the result columns match the at least one unpredicted target numerical column. 13. The method of claim 10 , wherein the second pre-processing method is performed in response to determining that there is at least one unpredicted numerical target column and there is one row in the target table corresponding to a plurality of rows in the source table. 14. The method of claim 10 , wherein the second pre-processing method comprises joining the rows of the source and target tables to produce a joined table having the at least one unpredicted target numerical column from the target table, further comprising: performing a regression analysis on numerical source columns to determine a regression equation predicting one of the at least one unpredicted target numerical column, wherein regression analysis output indicates regression equations and their confidence levels. 15. The method of claim 1 , further comprising: presenting the determined first and second category transformation rules for user review; and storing the determined first and second category transformation rules in a repository. 16. A system, comprising: a computer readable storage media including a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table; an analysis engine executed to perform operations, the operations comprising: selecting the source table i

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9720971B2 cover?
Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tabl…
Who is the assignee on this patent?
Bittner Torsten, Kache Holger, Roth Mary Ann, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F16/24564. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).