Method and system for accessing a set of data tables in a source database

US9996558B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9996558-B2
Application numberUS-201414475670-A
CountryUS
Kind codeB2
Filing dateSep 3, 2014
Priority dateSep 3, 2013
Publication dateJun 12, 2018
Grant dateJun 12, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to accessing a set of data tables in a source database. A set of table categories is provided for tables in the source database and a set of metrics is provided. For each table of the set of the data tables: the set of metrics is evaluated, the evaluated set of metrics is analyzed, and the table is categorized into one of the set of table categories using the result of the analysis. Information indicative of the table category of each table of the set of tables is output, and in response, a request to select data tables of the set of data tables is received according to a part of the table categories for data processing. A subset of data tables of the set of data tables is selected using the table categories for performing the data processing on the subset of data tables.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for accessing a set of data tables in a source database, comprising: providing, by a processor of a computing system, a set of table categories for tables in the source database; providing, by the processor, a set of metrics, each metric comprising a respective characteristic metric for each table category, wherein the providing of the set of metrics includes: generating, by the processor, from the set of data tables, multiple probe tables having each of the table categories, identifying, by the processor, characteristics for every table category after analyzing the multiple probe tables, and determining, by the processor, metrics associated with threshold values for each table category based on the identified characteristics of the multiple probe tables; for each table of the set of the data tables: evaluating, by the processor, the set of metrics, analyzing, by the processor, the evaluated set of metrics, and categorizing, by the processor, each table into one of the set of table categories using a result of the analysis; outputting, by the processor, information indicative of each table category of each table of the set of data tables; in response to the outputting of the information indicative of each table category of each table of the set of data tables, receiving, by the processor, a request to select data tables of the set of data tables according to a part of the set of table categories for extraction, transformation, and loading (ETL) processing; selecting, by the processor, a subset of data tables of the set of data tables using the set of table categories for performing the ETL processing on the subset of data tables; and performing, by the processor, the ETL processing on the source database using the selected subset of the data tables in the source database as input; wherein evaluating the set of metrics includes evaluating the set of metrics using a predetermined first set of statistics, the predetermined first set of statistics comprising a first part and a second part, wherein determining the predetermined first set of statistics comprises: calculating the first part of the predetermined first set of statistics using the set of tables; requesting monitoring data of the source database; receiving the monitoring data; and generating the second part of the predetermined first set of statistics using the monitoring data. 2. The method of claim 1 , wherein each metric is determined using at least one of the following characteristics of a table: read access rate, insert, delete and update rates, number of records, number of columns, number of primary Key and Foreign Key relationships, volume throughput for an extraction, transformation, and loading (ETL) process, timestamp value, and assigned trigger type. 3. The method claim 1 , wherein the predetermined first set of statistics used in the evaluation of the set of metrics comprises: a number of rows in a table; a number of columns in the table; a number of rows read; a number of rows inserted; a number of rows updated; a number of rows deleted; a median of a number of columns across the set of tables; an average number of columns across the set of tables; a partition number; and a table type. 4. The method of claim 1 , further comprising sorting tables in the set of data tables to generate a sorted list, the tables being sorted based on a second set of statistics determined for each table of the set of data tables, wherein the selecting of the subset of data tables is performed using the sorted list. 5. The method of claim 4 , wherein the second set of statistics comprises: percentage of used data in a table of the set of data tables over time; workload data type; and an access count for the tables of the set of data tables in a cache associated with the source database. 6. The method of claim 1 , further comprising: in response to the outputting of the information, receiving data indicating categories of the set of data tables; re-categorizing a table into one of the set of table categories using the data indicating categories of the set of data tables; and performing operations of outputting the information indicative of each table category of each table of the set of data tables, receiving the request to select the data tables, and selecting the subset of the data tables. 7. The method of claim 1 , further comprising: storing the information indicative of each table category of each table of the set of data tables in association with the ETL processing; and using said information for categorizing tables being processed by a subsequent data processing, wherein the tables comprise at least part of the set of data tables. 8. The method of claim 1 , further comprising identifying in each table of the subset of the data tables columns and rows using characteristics of the columns and rows, wherein the ETL processing is performed on the identified columns and rows of the subset of data tables. 9. The method of claim 8 , wherein the characteristics of the columns that are identified in each table of the subset of the data tables comprise: key columns, default values columns, range partition keys, empty columns, frequency of occurrence of a most frequent value that equals a number of rows in table, columns with informational constraints, and string columns. 10. The method of claim 8 , wherein the characteristics of the rows comprise predefined time ranges of data of the subset of data tables to be processed. 11. A computer program product for accessing a set of data tables in a source database, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to perform a method, comprising: providing, by the processor, a set of table categories for tables in the source database; providing, by the processor, a set of metrics, each metric comprising a respective characteristic metric for each table category, wherein the providing of the set of metrics includes: generating, by the processor, from the set of data tables, multiple probe tables having each of the table categories, identifying, by the processor, characteristics for every table category after analyzing the multiple probe tables, and determining, by the processor, metrics associated with threshold values for each table category based on the identified characteristics of the multiple probe tables; for each table of the set of the data tables: evaluating, by the processor, the set of metrics, analyzing, by the processor, the evaluated set of metrics, and categorizing, by the processor, each table into one of the set of table categories using a result of the analysis; outputting, by the processor, information indicative of each table category of each table of the set of data tables; in response to the outputting of the information indicative of each table category of each table of the set of data tables, receiving, by the processor, a request to select data tables of the set of data tables according to a part of the set of table categories for extraction, transformation, and loading (ETL) processing; selecting, by the processor, a subset of data tables of the set of data tables using the set of table categories for performing the ETL processing on the subset of data tables; and performing, by the processor, the ETL processing on the source database using the selected subset of the data tables in the source database as input; wherein evaluating the set of metrics includes evaluating the set of

Assignees

Inventors

Classifications

  • G06F16/21Primary

    Design, administration or maintenance of databases · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9996558B2 cover?
Embodiments relate to accessing a set of data tables in a source database. A set of table categories is provided for tables in the source database and a set of metrics is provided. For each table of the set of the data tables: the set of metrics is evaluated, the evaluated set of metrics is analyzed, and the table is categorized into one of the set of table categories using the result of the an…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/21. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 12 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).