Detecting data mapping relationship within database system and optimizing data operation

US11144518B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11144518-B2
Application numberUS-201816202140-A
CountryUS
Kind codeB2
Filing dateNov 28, 2018
Priority dateNov 28, 2018
Publication dateOct 12, 2021
Grant dateOct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention provide systems and methods for detecting data redundancy within a database and optimizing data access operations. The embodiments identify a candidate column to determine the relationship between the candidate column against the remaining columns. The system calculates the vector angles of rows based on candidate column pairs and determines the difference in the angle between the candidate column and the corresponding rows of data in the comparison columns. If the candidate column has a greater angle than the compared column, then the compared column is identified as redundant and is marked for the decluttering process.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting data redundancy within a database and optimizing data access operations, the method comprising: identifying a candidate group based on an equation of principal component analysis (PCA) wherein the candidate group comprises of a candidate column, a first comparison column, associated with a table of a database and the equation further comprises: Rs b a =Σp 1, p 2, ps,po wherein p1 is to scan catalog table information; p2 is to monitor database traffic of the table, ps define similarity model between each potential data type; po is a sampling method to go through each object tables to get an approximate group with vector method; a is a starting table and b is an ending table; identifying one or more sample rows based on the candidate column, the first comparison column and based on a predetermined parameter, wherein the predetermined parameter indicates identifying all columns against the candidate column or identifying the one more sample rows against the candidate column; calculating a first candidate vector angle based on a first row of the one or more sample rows, a first comparison vector angle on the first row of the one or more sample rows associated with the candidate column and the first comparison column, respectively; calculating a second candidate vector angle based on a second row of the one or more sample rows and a second comparison vector angle on the second row of the one or more sample rows associated with the candidate column and the first comparison column, respectively; determining if the first candidate vector angle and the first comparison vector angle are greater than the second candidate vector angle and the second comparison vector angle; responsive to determining the first candidate vector angle and the first comparison vector angle are greater than the second candidate vector angle and the second comparison vector angle, decluttering the first comparison columns; restructuring the database based on the decluttered first comparison column and any subsequent columns associated with the table of the database, wherein restructuring the database comprises of reduction in size of the database, reduction in size of a cache of the database, and improved performance of the database. 2. The method of claim 1 , further comprising: monitoring the database traffic from the candidate group; determining whether the database traffic is reduced based the candidate group; and responsive to determining that the database traffic is reduced, declutter the candidate group. 3. The method of claim 1 , wherein declutter the first comparison column further comprises: removing the data associated with the first comparison column from the database. 4. The method of claim 1 , wherein calculating the first candidate vector angle further comprises: determining the first candidate vector angle by using cosine similarity. 5. The method of claim 2 , wherein monitoring the database traffic further comprises: parsing out SQL statement between an application and the database. 6. The method of claim 5 , wherein parsing out SQL statement between the application and the database further comprises: determining a data column associated with the parsed-out SQL statement; and assigning the data column to the candidate group. 7. A computer program product for detecting data redundancy within a database and optimizing data access operations, the computer program product comprising: one or more computer readable non-transitory storage devices and program instructions stored on the one or more computer readable non-transitory storage devices, the stored program instructions comprising: program instructions to identify a candidate group is based on an equation of principal component analysis (PCA) wherein the candidate group comprises of a candidate column, a first comparison column, associated with a table of a database and the equation further comprises: Rs b a =Σp 1, p 2, ps,po wherein p1 is to scan catalog table information; p2 is to monitor database traffic of the table, ps define similarity model between each potential data type; po is a sampling method to go through each object tables to get an approximate group with vector method; a is a starting table and b is an ending table; program instructions to identify one or more sample rows based on the candidate column, the first comparison column and based on a predetermined parameter, wherein the predetermined parameter indicates identifying all columns against the candidate column or identifying the one more sample rows against the candidate column; program instructions to calculate a first candidate vector angle based on a first row of the one or more sample rows, a first comparison vector angle on the first row of the one or more sample rows associated with the candidate column and the first comparison column, respectively; program instructions to calculate a second candidate vector angle based on a second row of the one or more sample rows and a second comparison vector angle on the second row of the one or more sample rows associated with the candidate column and the first comparison column, respectively; program instructions to determine if the first candidate vector angle and the first comparison vector angle are greater than the second candidate vector angle and the second comparison vector angle; and program instructions to responsive to determine the first candidate vector angle and the first comparison vector angle are greater than the second candidate vector angle and the second comparison vector angle, program instructions to declutter the first comparison column; and program instructions to restructure the database based on the decluttered first comparison column and any subsequent columns associated with the table of the database, wherein restructuring the database comprises of reduction in size of the database, reduction in size of a cache of the database, and improved performance of the database. 8. The computer program product of claim 7 , further comprises: program instructions to the monitor database traffic from the candidate group; program instructions to determine whether the database traffic is reduced based the candidate group; and responsive to program instructions to determine that the database traffic is reduced, program instructions to declutter the candidate group. 9. The computer program product of claim 7 , wherein program instructions to declutter the first comparison column further comprises: program instructions to remove the data associated with the first comparison column from the database. 10. The computer program product of claim 7 , wherein program instructions to calculate the first candidate vector angle further comprises: program instructions to determine the first candidate vector angle by using cosine similarity. 11. The computer program product of claim 8 , wherein program instructions to monitor the database traffic further comprises: program instructions to parse out SQL statement between an application and the database. 12. The computer program product of claim 11 , wherein program instructions to parse out SQL statement between the application and the database further comprises: program instructions to parse out SQL statement between the application and the database; program instructions to determine a data column associated with the parsed-out SQL statement; and program instructions to assign the data column to the candidate group. 13. A computer system for detecting data redundancy within a database and optimizing data access operations, the computer s

Assignees

Inventors

Classifications

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Ensuring data consistency and integrity · CPC title

  • Complex mathematical operations {(function generation by table look-up G06F1/03; evaluation of elementary functions by calculation G06F7/544)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11144518B2 cover?
Embodiments of the present invention provide systems and methods for detecting data redundancy within a database and optimizing data access operations. The embodiments identify a candidate column to determine the relationship between the candidate column against the remaining columns. The system calculates the vector angles of rows based on candidate column pairs and determines the difference i…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).