Composite data creation with refinement suggestions

US11227104B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11227104-B2
Application numberUS-201514708146-A
CountryUS
Kind codeB2
Filing dateMay 8, 2015
Priority dateMay 11, 2014
Publication dateJan 18, 2022
Grant dateJan 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data profiling module receives user selection of spreadsheets, and the data from the selected spreadsheets is profiled. At least one matching column is identified among the spreadsheets selected. The data profiling module calculates a match metric for the at least one matching column, and unifies the spreadsheets into a single composite spreadsheet using the at least one identified matching column. A preview view of a composite spreadsheet is generated, visually indicating the at least one matching column, any non-matching columns between the spreadsheets, and the match metric for the matching columns. An action history module identifies spreadsheets for use in the procedure, and stores any action applied to the spreadsheets as a procedure template that can be applied to a plurality of other spreadsheets.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-executed method for previewing a composite data set, comprising: retrieving from one or more sources first and second data sets; formatting the first and second data sets into a plurality of columns of data values; profiling the data values from the first and second data sets to identify data types and data domains for the plurality of columns of data values; identifying at least one column of the first data set matching at least one column of the second data set based on the profiling; calculating a match metric for the at least one column matched between the first and second data sets; unifying the first and second data sets using the at least one column and a unifying action, wherein the unifying action is determined based on an overlap between the first data set and the second data set; generating a preview view of the composite spreadsheet showing the first and second data sets unified using the at least one column prior to committing the unifying of the first and second data sets into the composite spreadsheet, the preview view visually indicating the at least one column matched between the first and second data sets, a plurality of non-matching columns among the first and second data sets, and the match metric; and in response to receiving an indication from a user, committing the unifying of the first and second data sets into the composite spreadsheet. 2. The computer-executed method of claim 1 , wherein the unifying action comprises a join function, wherein the data sets are appended side by side, and wherein the at least one matching column is at least one key column joining the data sets. 3. The computer-executed method of claim 2 , wherein identifying at least one column of the first data set matching at least one column of the second data set further comprises: identifying at least one key column for joining the data sets; and receiving user selection of a key column; wherein the data sets are joined using the selected key column. 4. The computer-executed method of claim 3 , further comprising: presenting the identified at least one key column for display to the user; and presenting the match metric as the percentage overlap for the at one least key column between the data sets. 5. The computer-executed method of claim 1 , wherein the unifying action comprises a merge function and wherein the data sets are appended with top and bottom. 6. The computer-executed method of claim 5 , further comprising: presenting the identified at least one matching column for display to the user; and presenting the match metric as the percentage overlap for the at least one matching column between the data sets. 7. The computer-executed method of claim 1 , wherein the unifying action comprises a lookup function and wherein a column of data from the first data set is added to the second data set. 8. The computer-executed method of claim 1 , wherein the one or more sources are selected from the group consisting of databases, applications, and local files. 9. The computer-executed method of claim 1 , wherein the unifying of the first and second data sets into the single composite spreadsheet is in response to a user selection of a composite data control. 10. The computer-executed method of claim 1 , further comprising: profiling the composite spreadsheet; providing one or more data refinement suggestions for at least one column of the composite spreadsheet based on the profiling of the composite spreadsheet, wherein the one or more data refinement suggestions comprise one or more of: validating a known data type, identifying data inconsistencies, standardizing data formats, or enriching the data from additional sources. 11. The computer-executed method of claim 1 , further comprising: storing the unifying and any actions taken on the composite spreadsheet as a procedure template configured to be applied to other data sets. 12. An apparatus for previewing a composite data set, the apparatus comprising: one or more processors; and one or more memories operatively coupled to at least one of the one or more processors and having instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to: retrieve from one or more sources first and second data sets; format the first and second data sets into a plurality of columns of data values; profile the data values from the first and second data sets to identify data types and data domains for the plurality of columns of data values; identify at least one column of the first data set matching at least one column of the second data set based on the profiling; calculate a match metric for the at least one column matched between the first and second data sets; unify the first and second data sets using the at least one column and a unifying action, wherein the unifying action is determined based on an overlap between the first data set and the second data set; generate a preview view of the composite spreadsheet showing the first and second data sets unified using the at least one column prior to committing the unifying of the first and second data sets into the composite spreadsheet, the preview view visually indicating the at least one column matched between the first and second data sets, a plurality of non-matching columns among the first and second data sets, and the match metric; and in response to receiving an indication from a user, commit the unifying of the first and second data sets into the composite spreadsheet. 13. The apparatus of claim 12 , wherein the unifying action comprises a join function, wherein the data sets are appended side by side, and wherein the at least one matching column is at least one key column joining the data sets. 14. The apparatus of claim 13 , wherein the instructions that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to identify at least one column of the first data set matching at least one column of the second data set further cause at least one of the one or more processors to: identify at least one key column for joining the data sets; and receive user selection of a key column; wherein the data sets are joined using the selected key column. 15. The apparatus of claim 12 , wherein the unifying action comprises a merge function and wherein the data sets are appended with top and bottom. 16. The apparatus of claim 12 , wherein the unifying action comprises a lookup function and wherein a column of data from the first data set is added to the second data set. 17. The apparatus of claim 12 , wherein the one or more sources are selected from the group consisting of databases, applications, and local files. 18. The apparatus of claim 12 , wherein the unifying of the first and second data sets into the single composite spreadsheet is in response to a user selection of a composite data control. 19. The apparatus of claim 12 , wherein at least one of the one or more memories has further instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to: profile the composite spreadsheet; provide one or more data refinement suggestions for at least one column of the composite spreadsheet based on the profiling of the composite spreadsheet, wherein the one or more data refinement suggestions comprise one or more of: validating a known data type, identifying data inconsistenc

Assignees

Inventors

Classifications

  • G06F40/18Primary

    of spreadsheets (form-filling G06F40/174) · CPC title

  • G06F16/25Primary

    Integrating or interfacing systems involving database management systems · CPC title

  • Join operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11227104B2 cover?
A data profiling module receives user selection of spreadsheets, and the data from the selected spreadsheets is profiled. At least one matching column is identified among the spreadsheets selected. The data profiling module calculates a match metric for the at least one matching column, and unifies the spreadsheets into a single composite spreadsheet using the at least one identified matching c…
Who is the assignee on this patent?
Informatica Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/18. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).