Detection and quantifying of data redundancy in column-oriented in-memory databases

US9785660B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9785660-B2
Application numberUS-201414496715-A
CountryUS
Kind codeB2
Filing dateSep 25, 2014
Priority dateSep 25, 2014
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer-readable storage media for quantifying a redundancy of data stored in tables of a database. In some implementations, actions include, for each primary key and table pair in a set of primary key and table pairs, determining an aggregate severity sub-score based on one or more values of the primary key in the table, the primary key being included in a set of primary keys and the table being included in a set of tables, determining an aggregate severity score for each primary key in the set of primary keys based on aggregate severity sub-scores associated with the primary key to provide a plurality of aggregate severity scores, each aggregate severity score indicating a relative redundancy of values of the primary key across all tables in the set of tables, and providing a list of aggregate severity scores and corresponding primary keys for display to a user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for quantifying a redundancy of data stored in tables of a column-oriented in-memory database, the method being executed using one or more processors and comprising: determining, by the one or more processors, data structures for a plurality of data objects stored in the column-oriented in-memory database through an integration of distributed enterprise systems, each data structure defining one or more attributes, one attribute being a primary key of a respective data object; for each primary key and table pair in a set of primary key and table pairs, determining, by the one or more processors, an aggregate severity sub-score based on one or more values of the primary key in the table, the primary key being included in a set of primary keys and the table being included in a set of tables; determining, by the one or more processors, an aggregate severity score for each primary key in the set of primary keys based on aggregate severity sub-scores associated with the primary key to provide a plurality of aggregate severity scores, each aggregate severity score indicating a relative redundancy of values of the primary key across all tables in the set of tables and each severity sub-score being determined based on a number of occurrences of the values of the primary key across all the tables and an attribute weight that is inversely proportional to an association grade for the primary key relative to the primary table; providing, by the one or more processors, a list of aggregate severity scores and corresponding primary keys for display to a user; and performing, by the one or more processors, at least one operation to reduce the relative redundancy associated with the primary key of the set of primary keys based on the list of aggregate severity scores. 2. The method of claim 1 , wherein determining an aggregate severity sub-score based on one or more values of the primary key in the table for a primary key and table pair comprises: determining a value severity score based on the plurality of value severity sub-scores; and determining the aggregate severity sub-score based on a plurality of value severity scores, the value severity score being included in the plurality of value severity scores. 3. The method of claim 2 , wherein the aggregate severity sub-score is further based on a number of entries of the table of the primary key and table pair. 4. The method of claim 1 , further comprising obtaining a set of associated attributes for each primary key in the set of primary keys based on a chain of tables in the set of tables, the chain of tables comprising two or more tables. 5. The method of claim 4 , further comprising, for each associated attribute in the set of associated attributes, determining an attribute weight based on a degree of indirection in the chain of tables. 6. The method of claim 1 , wherein the list of aggregate severity scores is provided as a ranked list of scores based on respective values of the plurality of aggregate severity scores. 7. The method of claim 1 , wherein the primary key of the set of primary keys is associated with the highest aggregate severity score within the set of primary keys. 8. A non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for quantifying a redundancy of data stored in tables of a column-oriented in-memory database, the operations comprising: determining data structures for a plurality of data objects stored in the column-oriented in-memory database through an integration of distributed enterprise systems, each data structure defining one or more attributes, one attribute being a primary key of a respective data object; for each primary key and table pair in a set of primary key and table pairs, determining an aggregate severity sub-score based on one or more values of the primary key in the table, the primary key being included in a set of primary keys and the table being included in a set of tables; determining an aggregate severity score for each primary key in the set of primary keys based on aggregate severity sub-scores associated with the primary key to provide a plurality of aggregate severity scores, each aggregate severity score indicating a relative redundancy of values of the primary key across all tables in the set of tables and each severity sub-score being determined based on a number of occurrences of the values of the primary key across all the tables and an attribute weight that is inversely proportional to an association grade for the primary key relative to the primary table; providing a list of aggregate severity scores and corresponding primary keys for display to a user; and performing at least one operation to reduce the relative redundancy associated with the primary key of the set of primary keys based on the list of aggregate severity scores. 9. The computer-readable storage medium of claim 8 , wherein determining an aggregate severity sub-score based on one or more values of the primary key in the table for a primary key and table pair comprises: determining a value severity score based on the plurality of value severity sub-scores; and determining the aggregate severity sub-score based on a plurality of value severity scores, the value severity score being included in the plurality of value severity scores. 10. The computer-readable storage medium of claim 9 , wherein the aggregate severity sub-score is further based on a number of entries of the table of the primary key and table pair. 11. The computer-readable storage medium of claim 8 , wherein operations further comprise obtaining a set of associated attributes for each primary key in the set of primary keys based on a chain of tables in the set of tables, the chain of tables comprising two or more tables. 12. The computer-readable storage medium of claim 11 , wherein operations further comprise, for each associated attribute in the set of associated attributes, determining an attribute weight based on a degree of indirection in the chain of tables. 13. The computer-readable storage medium of claim 8 , wherein the list of aggregate severity scores is provided as a ranked list of scores based on respective values of the plurality of aggregate severity scores. 14. The computer-readable storage medium of claim 8 , wherein operations further comprise: receiving user input based on the list of aggregate severity scores, the user input indicating a command to execute an operation to reduce redundancy associated with a primary key of the set of primary keys; and performing the operation. 15. A system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations for quantifying a redundancy of data stored in tables of a column-oriented in-memory database, the operations comprising: determining data structures for a plurality of data objects stored in the column-oriented in-memory database through an integration of distributed enterprise systems, each data structure defining one or more attributes, one attribute being a primary key of a respective data object; for each primary key and table pair in a set of primary key and table pairs, determining an aggregate severity sub-score based on one or more values of the primary key in the table, the primary key being included in a set of pr

Assignees

Inventors

Classifications

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Grouping and aggregation · CPC title

  • Approximate or statistical queries · CPC title

  • Selection or weighting of terms for indexing · CPC title

  • Natural language query formulation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9785660B2 cover?
Methods, systems, and computer-readable storage media for quantifying a redundancy of data stored in tables of a database. In some implementations, actions include, for each primary key and table pair in a set of primary key and table pairs, determining an aggregate severity sub-score based on one or more values of the primary key in the table, the primary key being included in a set of primary…
Who is the assignee on this patent?
Said Bare, Jentsch Frank, Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).