Systems and methods for analyzing existing data models

US9582553B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9582553-B2
Application numberUS-201213533683-A
CountryUS
Kind codeB2
Filing dateJun 26, 2012
Priority dateJun 26, 2012
Publication dateFeb 28, 2017
Grant dateFeb 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method comprising receiving a user request. The method analyzes the data in a plurality of data sets to find inconsistent mappings. Data of data sets, such as columns formed by a join condition, are compared to determine matching or non-matching distinct characteristic values. A composite data set is generated based on the comparison. Another data set is compared with the composite data set, and the composite data set is enhanced. Each data set is compared in sequence, if the composite data set is not empty, until all data sets are analyzed. A result set is generated based on the matching or non-matching distinct characteristic values. The method may also determine if a join operates as a data filter. The operations that are used for analysis may include ‘count distinct’, ‘intersection’ and ‘Boolean operators’.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a processor, a user request, wherein a data store stores data as a plurality of data sets, each data set comprising a plurality of fields and a plurality of data elements, and wherein each field is associated with a portion of the data elements, and wherein the user request specifies a first field of a first data set in the plurality of data sets and a second field of a second data set in the plurality of data sets; comparing, by the processor, data elements associated with the first field of the first data set of the plurality of data sets; determining, by the processor, a first set of distinct values of data elements associated with the first field of the first data set based on the comparison of the data elements associated with the first field of the first data set; comparing, by the processor, data elements associated with the second field of the second data set of the plurality of data sets; determining, by the processor, a second set of distinct values of data elements associated with the second field of the second data set based on the comparison of the data elements associated with the second field of the second data set; presenting, by the processor, through a user interface, a first list comprising the first set of distinct values and a second list comprising the second set of distinct values in response to the user request; receiving, by the processor, an instruction through the user interface to correct a typographical error associated with a first distinct value of a data element from the first data set; modifying the first distinct value of the data element in the first data set to match a second distinct value of a data element from the second data set in response to the instruction; determining, by the processor, a set of intersections of the first set of distinct values of data elements of the first data set that includes the modified first distinct value of the data element and the second set of distinct values of data elements of the second data set; and generating, by the processor, a result set based on the determined set of intersections. 2. The computer-implemented method of claim 1 , wherein the user request includes a user-defined joined operation, wherein the first data set is a composite result set of a union of a third data set in the plurality of data sets and a fourth data set in plurality of data sets, wherein the second data set is a data set to be joined based on the user request. 3. The computer-implemented method of claim 1 , further comprising generating, by the processor, data model display information based on the result set. 4. The computer-implemented method of claim 1 , wherein the set of intersections is a first set of intersections, the method further comprising: generating, by the processor, a composite data set based on the first set of determined intersections of the first set of distinct values of data elements of the first data set and the second set of distinct values of data elements of the second data set; performing, by the processor, the following upon determining that the composite data set is not empty: comparing another data set in the plurality of data sets and the composite data set based on a third field of said another data set and a fourth field of the composite data set; determining a third set of distinct values of data elements in the third field of said another data set and a fourth set of distinct values of data elements in the fourth field of the composite data set; determining a second set of intersections of the third set of distinct values of data elements of the first data set and the fourth set of distinct values of data elements of the fourth data set; modifying the result set based on the second set of determined intersections; and modifying the composite data set based on the second set of determined intersections of data elements of said another data set and the composite data set. 5. The computer-implemented method of claim 4 , further comprising generating, by the processor, data model display information based on the result set. 6. The computer-implemented method of claim 4 , further comprising repeating the following for each data set of the group of data sets, upon determining that the composite data set is not empty: said comparing, said determining distinct values, said determining intersections, said modifying the result set and said modifying the composite data set. 7. The computer-implemented method of claim 4 , wherein generating the result set further comprises determining, by the processor, distinct values of a join operation of said another data set and the composite data set. 8. The computer-implemented method of claim 4 , wherein generating the result set further comprises: determining, by the processor, a total of the distinct values of each data set, determining, by the processor, a total of the matching values of the set of intersections and determining, by the processor, a values in a join between compared data sets. 9. The computer-implemented method of claim 4 wherein the second set of intersections of said another data set and the composite data set are based on join operations. 10. The computer-implemented method of claim 9 , further comprising determining, by the processor, whether the join operations operate as a filter. 11. A non-transitory computer readable medium embodying a computer program for performing a method, said method comprising: receiving a user request in a controller, wherein a data store stores data as a plurality of data sets, each data set comprising a plurality of fields and a plurality of data elements, and wherein each field is associated with a portion of the data elements, and wherein the user request specifies a first field of a first data set in the plurality of data sets and a second field of a second data set in the plurality of data sets; comparing data elements associated with the first field of the first data set of the plurality of data sets; determining a first set of distinct values of data elements associated with the first field of the first data set based on the comparison of the data elements associated with the first field of the first data set; comparing data elements associated with the second field of the second data set of the plurality of data sets; determining a second set of distinct values of data elements associated with the second field of the second data set based on the comparison of the data elements associated with the second field of the second data set; presenting, through the user interface, a first list comprising the first set of distinct values and a second list comprising the second set of distinct values in response to the user request; receiving an instruction through the user interface to correct a typographical error associated with a first distinct value of a data element from the first data set; modifying the first distinct value of the data element in the first data set to match a second distinct value of a data element from the second data set in response to the instruction; determining a set of intersections of the first set of distinct values of data elements of the first data set that includes the modified first distinct value of the data element and the second set of distinct values of data elements of the second data set; and generating a result set based on the set of determined intersections. 12. The non-transitory computer readable medium of claim 11 , wherein the method further comprises generating data model display information based on the result set. 13. T

Assignees

Inventors

Classifications

  • G06F16/25Primary

    Integrating or interfacing systems involving database management systems · CPC title

  • Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9582553B2 cover?
A computer-implemented method comprising receiving a user request. The method analyzes the data in a plurality of data sets to find inconsistent mappings. Data of data sets, such as columns formed by a join condition, are compared to determine matching or non-matching distinct characteristic values. A composite data set is generated based on the comparison. Another data set is compared with the…
Who is the assignee on this patent?
Bratz Silvia, Nagel Klaus, Rueger Christel, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/25. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).