What technology area does this patent fall under?

Primary CPC classification G06F16/258. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Transforming data for a target schema

US11249960B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11249960-B2
Application number	US-201816004863-A
Country	US
Kind code	B2
Filing date	Jun 11, 2018
Priority date	Jun 11, 2018
Publication date	Feb 15, 2022
Grant date	Feb 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments generally relate transforming data for a target schema. In some embodiments, a method includes receiving input data, where the input data includes a plurality of segments, and where the segments include a plurality of source fields containing target data. The method further includes characterizing the input data based at least in part on a plurality of predetermined metrics, where the predetermined metrics determine a structure of the input data. The method further includes mapping the target data in the source fields of the segments to a plurality of target fields of a target schema based at least in part on the characterizing. The method further includes populating the target fields of the target schema with the target data from the source fields based at least in part on the mapping.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one processor and a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by the at least one processor to cause the at least one processor to perform operations comprising: receiving input data, wherein the input data includes a plurality of segments, and wherein the segments include a plurality of source fields; parsing each of the segments into tokens, wherein each token is data that is contained in a particular source field of the plurality of source fields, and wherein each token includes at least one alphanumeric or numeric value; determining contextual information associated with each token, wherein the contextual information comprises one or more features associated with each token, and wherein determining the contextual information comprises determining one or more features of each token based on metrics and determining whether the numeric value of a given token conforms to an expected range of a first numeric target field versus a second numeric target field based on a type of the target field; mapping the tokens in the source fields of the segments to a plurality of target fields of a target schema based at least in part on the contextual information and confidence values, and wherein the confidence values that meet one or more confidence value thresholds indicate degrees of matching between each token and one or more target fields; and populating the target fields of the target schema with the tokens from the source fields based at least in part on the mapping, wherein the parsing of the segments into tokens, the determining of the contextual information associated the tokens, and the mapping of the tokens in the source fields to the target fields is performed substantially during the populating the target fields of the target schema with the tokens from the source fields. 2. The system of claim 1 , wherein the input data is semi-structured data. 3. The system of claim 1 , wherein the target schema is a structured schema. 4. The system of claim 1 , wherein the structure of each token comprises a relationship between one or more structural features of each token and at least one other token in a same segment. 5. The system of claim 1 , wherein, to map the tokens in the source fields of the segments to the target fields of a target schema, the at least one processor further performs operations comprising: comparing each target field to the contextual information associated with each token; and matching the token in each source field to one of the target fields of the target schema based at least in part on the comparing of each target field to the contextual information. 6. A computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by at least one processor to cause the at least one processor to perform operations comprising: receiving input data, wherein the input data includes a plurality of segments, and wherein the segments include a plurality of source fields; parsing each of the segments into tokens, wherein each token is data that is contained in a particular source field of the plurality of source fields, and wherein each token includes at least one alphanumeric or numeric value; determining contextual information associated with each token, wherein the contextual information comprises one or more features associated with each token, and wherein determining the contextual information comprises determining one or more features of each token based on metrics and determining whether the numeric value of a given token conforms to an expected range of a first numeric target field versus a second numeric target field based on a type of the target field; mapping the tokens in the source fields of the segments to a plurality of target fields of a target schema based at least in part on the contextual information and confidence values, and wherein the confidence values that meet one or more confidence value thresholds indicate degrees of matching between each token and one or more target fields; and populating the target fields of the target schema with the tokens from the source fields based at least in part on the mapping, wherein the parsing of the segments into tokens, the determining of the contextual information associated the tokens, and the mapping of the tokens in the source fields to the target fields is performed substantially during the populating the target fields of the target schema with the tokens from the source fields. 7. The computer program product of claim 6 , wherein the input data is semi-structured data. 8. The computer program product of claim 6 , wherein the target schema is a structured schema. 9. The computer program product of claim 6 , wherein the structure of each token comprises a relationship between one or more structural features of each token and at least one other token in a same segment. 10. The computer program product of claim 6 , wherein, to map the tokens in the source fields of the segments to the target fields of a target schema, the at least one processor further performs operations comprising: comparing each target field to the contextual information associated with each token; and matching the token in each source field to one of the target fields of the target schema based at least in part on the comparing of each target field to the contextual information. 11. The computer program product of claim 6 , wherein, to map the tokens in the source fields of the segments to the target fields of a target schema, the at least one processor further performs operations comprising: determining confidence values, wherein the confidence values indicate degrees of matching between each token and one or more target fields; and matching tokens in the source fields to the target fields of the target schema. 12. A computer-implemented method for transforming data for a target schema, the method comprising: receiving input data, wherein the input data includes a plurality of segments, and wherein the segments include a plurality of source fields; parsing each of the segments into tokens, wherein each token is data that is contained in a particular source field of the plurality of source fields, and wherein each token includes at least one alphanumeric or numeric value; determining contextual information associated with each token, wherein the contextual information comprises one or more features associated with each token, and wherein determining the contextual information comprises determining one or more features of each token based on metrics and determining whether the numeric value of a given token conforms to an expected range of a first numeric target field versus a second numeric target field based on a type of the target field; mapping the tokens in the source fields of the segments to a plurality of target fields of a target schema based at least in part on the contextual information and confidence values, and wherein the confidence values that meet one or more confidence value thresholds indicate degrees of matching between each token and one or more target fields; and populating the target fields of the target schema with the tokens from the source fields based at least in part on the mapping, wherein the parsing of the segments into tokens, the determining of the contextual information associated the tokens, and the mapping of the tokens in the source fields to the target fields is performed substantially during the populating the target fields of the target schema with the

Assignees

Inventors

Classifications

G06F16/84
Mapping; Conversion · CPC title
G06F16/258Primary
Data format conversion from or to a database · CPC title
G06F16/211Primary
Schema design and management · CPC title

Patent family

Related publications grouped by family.

View patent family 68764956

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11249960B2 cover?: Embodiments generally relate transforming data for a target schema. In some embodiments, a method includes receiving input data, where the input data includes a plurality of segments, and where the segments include a plurality of source fields containing target data. The method further includes characterizing the input data based at least in part on a plurality of predetermined metrics, where t…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F16/258. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System and method for ontology induction through statistical profiling and reference schema matching

Leveraging corporal data for data parsing and predicting

Processing a data set that is not organized according to a schema being used for organizing data

Automatically Discovering Topology Of An Information Technology (IT) Infrastructure

Data model change management

Rule-based low-latency delivery of healthcare data

Frequently asked questions