Generating composite key relationships between database objects based on sampling

US9336246B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9336246-B2
Application numberUS-201213406974-A
CountryUS
Kind codeB2
Filing dateFeb 28, 2012
Priority dateFeb 28, 2012
Publication dateMay 10, 2016
Grant dateMay 10, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment of the present invention, a system determines key relationships between database tables and includes a computer system including at least one processor. The system determines a sampling range for one or more matching columns between first and second database tables. The matching columns satisfy one or more matching criteria and the sampling range is based on quantities of distinct values within the matching columns. Data is sampled from the first and second database tables in accordance with the sampling ranges to determine a sample set. Keys between the first and second database tables are determined based on matching between columns within the sample set. Embodiments of the present invention further include a method and computer program product for determining key relationships between database tables in substantially the same manner described above.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of determining key relationships between database tables comprising: determining a sampling range for one or more matching columns between first and second database tables, wherein the matching columns satisfy one or more matching criteria and the sampling range is defined by minimum and maximum column values and is based on quantities of distinct values within the matching columns, and wherein determining a sampling range includes: identifying a median value within a set of ordered column values for the matching columns; and assigning consecutive column values from the ordered set less than the median value to the minimum column value and assigning consecutive column values from the ordered set greater than the median value to the maximum column value until the minimum and maximum column values of the sampling range produce a desired size for a sample set; sampling data from the first and second database tables with values complying with the minimum and maximum column values of the sampling range for the one or more matching columns to determine the sample set; and determining keys between the first and second database tables based on a comparison of matching between columns within the sample set and matching between columns of a full data set of the first and second database tables. 2. The computer-implemented method of claim 1 , wherein the matching criteria include at least a hit rate. 3. The computer-implemented method of claim 1 , wherein determining keys further includes: determining a candidate set of columns from the matching columns based on a comparison of hit rates within a full data set and the sample set; and determining a key from the candidate set based on at least a selectivity and hit rate for the candidate set within the sample set. 4. The computer-implemented method of claim 1 , further comprising: filtering the matching columns based on at least one of selectivity and hit rate. 5. A system for determining key relationships between database tables comprising: a computer system including at least one processor configured to: determine a sampling range for one or more matching columns between first and second database tables, wherein the matching columns satisfy one or more matching criteria and the sampling range is defined by minimum and maximum column values and is based on quantities of distinct values within the matching columns, and wherein determining a sampling range includes: identifying a median value within a set of ordered column values for the matching columns; and assigning consecutive column values from the ordered set less than the median value to the minimum column value and assigning consecutive column values from the ordered set greater than the median value to the maximum column value until the minimum and maximum column values of the sampling range produce a desired size for a sample set; sample data from the first and second database tables with values complying with the minimum and maximum column values of the sampling range for the one or more matching columns to determine the sample set; and determine keys between the first and second database tables based on a comparison of matching between columns within the sample set and matching between columns of a full data set of the first and second database tables. 6. The system of claim 5 , wherein the matching criteria include at least a hit rate. 7. The system of claim 5 , wherein determining keys further includes: determining a candidate set of columns from the matching columns based on a comparison of hit rates within a full data set and the sample set; and determining a key from the candidate set based on at least a selectivity and hit rate for the candidate set within the sample set. 8. The system of claim 5 , wherein the at least one processor is further configured to: filter the matching columns based on at least one of selectivity and hit rate. 9. A computer program product for determining key relationships between database tables comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to: determine a sampling range for one or more matching columns between first and second database tables, wherein the matching columns satisfy one or more matching criteria and the sampling range is defined by minimum and maximum column values and is based on quantities of distinct values within the matching columns, and wherein determining a sampling range includes: identifying a median value within a set of ordered column values for the matching columns; and assigning consecutive column values from the ordered set less than the median value to the minimum column value and assigning consecutive column values from the ordered set greater than the median value to the maximum column value until the minimum and maximum column values of the sampling range produce a desired size for a sample set; sample data from the first and second database tables with values complying with the minimum and maximum column values of the sampling range for the one or more matching columns to determine the sample set; and determine keys between the first and second database tables based on a comparison of matching between columns within the sample set and matching between columns of a full data set of the first and second database tables. 10. The computer program product of claim 9 , wherein the matching criteria include at least a hit rate. 11. The computer program product of claim 9 , wherein determining keys further includes: determining a candidate set of columns from the matching columns based on a comparison of hit rates within the full data set and the sample set; and determining a key from the candidate set based on at least a selectivity and hit rate for the candidate set within the sample set. 12. The computer program product of claim 9 , wherein the computer readable program code further comprises computer readable program code configured to: filter the matching columns based on at least one of selectivity and hit rate.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9336246B2 cover?
According to one embodiment of the present invention, a system determines key relationships between database tables and includes a computer system including at least one processor. The system determines a sampling range for one or more matching columns between first and second database tables. The matching columns satisfy one or more matching criteria and the sampling range is based on quantiti…
Who is the assignee on this patent?
Gorelik Alexander, Santhanam Sharad, Tsentsiper Lev M, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/211. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 10 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).