Semantic discovery and mapping between data sources

US9336253B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9336253-B2
Application numberUS-201414507805-A
CountryUS
Kind codeB2
Filing dateOct 6, 2014
Priority dateSep 10, 2003
Publication dateMay 10, 2016
Grant dateMay 10, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the first and the second data source. The binding condition and the correlations are used to discover a transformation function between portions of data in the first and the second data source.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for determining a functional correlation between columns of a first data source and a second data source, the method comprising: combining the first data source and the second data source into a combined data source based on a binding condition; identifying a correlation count for columns of the first data source and the second data source in the combined data source; comparing the correlation count to a threshold value; and determining the functional correlation between columns of the first data source and the second data source, when the correlation count is more than the threshold value. 2. The method of claim 1 , wherein the binding condition comprises an expression on attributes of data objects in the first and second data sources that identifies which instances of the data objects in the first data source map to instances of the data objects in the second data source. 3. The method of claim 1 , wherein the combining the first and second data source based on a binding condition comprises an outer join between the first and the second data sources. 4. The method of claim 1 , wherein the correlation count identifies different data objects in the first and second data sources that correspond to each other in the first and second data sources. 5. The method of claim 1 , wherein the functional correlation is defined by a stateless transformation function that transforms data objects in the first data source to different data objects in the second data source. 6. The method of claim 1 , further comprising: determining correlation counts for all binding conditions; determining whether a highest correlation count of the binding conditions is greater than the threshold value; and indicating that there is no binding between the first and second data sources in response to determining that the highest correlation count is less than the threshold value. 7. The method of claim 6 , wherein the binding condition having the highest correlation count is selected as a primary binding condition to use to determine the functional correlation in response determining that the highest correlation count is greater than the threshold value. 8. A system in communication with a first data source and a second data source, comprising: a processor; and a computer storage device having program instructions executed by the processor to determine a functional correlation between columns of the first data source and the second data source by perform operations comprising: combining the first data source and the second data source into a combined data source based on a binding condition; identifying a correlation count for columns of the first data source and the second data source in the combined data source; comparing the correlation count to a threshold value; and determining the functional correlation between columns of the first data source and the second data source, when the correlation count is more than the threshold value. 9. The system of claim 8 , wherein the binding condition comprises an expression on attributes of data objects in the first and second data sources that identifies which instances of the data objects in the first data source map to instances of the data objects in the second data source. 10. The system of claim 8 , wherein the combining the first and second data source based on a binding condition comprises an outer join between the first and the second data sources. 11. The system of claim 8 , wherein the correlation count identifies different data objects in the first and second data sources that correspond to each other in the first and second data sources. 12. The system of claim 8 , wherein the functional correlation is defined by a stateless transformation function that transforms data objects in the first data source to different data objects in the second data source. 13. The system of claim 8 , wherein the operations further comprise: determining correlation counts for all binding conditions; determining whether a highest correlation count of the binding conditions is greater than the threshold value; and indicating that there is no binding between the first and second data sources in response to determining that the highest correlation count is less than the threshold value. 14. The system of claim 13 , wherein the binding condition having the highest correlation count is selected as a primary binding condition to use to determine the functional correlation in response determining that the highest correlation count is greater than the threshold value. 15. A computer readable storage device comprising executable program instructions executed by a processor to determine a functional correlation between columns of a first data source and a second data source by performing operations, the operations comprising: combining the first data source and the second data source into a combined data source based on a binding condition; identifying a correlation count for columns of the first data source and the second data source in the combined data source; comparing the correlation count to a threshold value; and determining the functional correlation between columns of the first data source and the second data source, when the correlation count is more than the threshold value. 16. The computer readable storage device of claim 15 , wherein the binding condition comprises an expression on attributes of data objects in the first and second data sources that identifies which instances of the data objects in the first data source map to instances of the data objects in the second data source. 17. The computer readable storage device of claim 15 , wherein the combining the first and second data sources based on a binding condition comprises an outer join between the first and the second data sources. 18. The computer readable storage device of claim 15 , wherein the correlation count identifies different data objects in the first and second data sources that correspond to each other in the first and second data sources. 19. The computer readable storage device of claim 15 , wherein the functional correlation is defined by a stateless transformation function that transforms data objects in the first data source to different data objects in the second data source. 20. The computer readable storage device of claim 15 , wherein the operations further comprise: determining correlation counts for all binding conditions; determining whether a highest correlation count of the binding conditions is greater than the threshold value; and indicating that there is no binding between the first and second data sources in response to determining that the highest correlation count is less than the threshold value. 21. The computer readable storage device of claim 20 , wherein the binding condition having the highest correlation count is selected as a primary binding condition to use to determine the functional correlation in response determining that the highest correlation count is greater than the threshold value.

Assignees

Inventors

Classifications

  • Generating database or data structure, e.g. via user interface · CPC title

  • Manipulating data structure, e.g. compression, compaction, compilation · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • Object-oriented database structure · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9336253B2 cover?
An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second da…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30315. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 10 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).