What technology area does this patent fall under?

Primary CPC classification G06F9/4881. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data curation with synthetic data generation

US12073246B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12073246-B2
Application number	US-202117358979-A
Country	US
Kind code	B2
Filing date	Jun 25, 2021
Priority date	Jun 25, 2021
Publication date	Aug 27, 2024
Grant date	Aug 27, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method may include identifying an identifier field included in a first datatype of a seed data sample associated with a source system. The identifier field may store a first value that enables a differentiation between different instances of the first datatype. A relationship field, which stores a second value that define a relationship between the first datatype and a second data type, may be identified. A synthetic data sample may be generated by populating the identifier field of the synthetic data sample with a synthetically generated value and the relationship field of the synthetic data sample with the second value. The synthetic data sample may be sent to a target system to enable a performance of a task at the target system. The synthetic data sample may supplement a volume and/or a diversity of the data that occurs organically at the source system.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: at least one processor; and at least one memory including program code which when executed by the at least one processor provides operations comprising: identifying an identifier field included in a first datatype of a seed data sample associated with a source system, the identifier field storing a first value that enables a differentiation between different instances of the first datatype, wherein the seed data sample comprises data that occurs at the source system; identifying a relationship field included in the first datatype of the seed data sample, the relationship field storing a second value that defines a relationship between the first datatype of the seed data sample and a second datatype; generating, based at least on the seed data sample, a first synthetic data sample, the generating includes populating the identifier field of the first synthetic data sample with a first synthetically generated value and the relationship field of the first synthetic data sample with the second value, wherein the first synthetic data sample is generated to supplement a volume and a diversity of the data at the source system; and sending, to a target system, the first synthetic data sample to enable a performance of a task at the target system; in response to determining that the first datatype is a parent datatype to a second datatype, propagating, to a second synthetic data sample of the second datatype, a change corresponding to the first synthetically generated value populating the identifier field of the first synthetic data sample. 2. The system of claim 1 , further comprising: in response to determining that the first datatype is a child datatype of a second datatype, propagating, to the identifier field of the first synthetic data sample, a change corresponding to a second synthetically generated value populating the identifier field of a second synthetic data sample of the second datatype. 3. The system of claim 1 , wherein the first synthetic data sample is sent to the target system by at least pushing the first synthetic data sample to an event stream providing a constant flow of data to the target system. 4. The system of claim 1 , wherein the first synthetic data sample is a variation of the seed data sample that is different from the seed data sample but retains a same dependency to other datatypes. 5. The system of claim 1 , wherein the first synthetic data sample is sent to the target system by at least pushing the first synthetic data sample to a raw data store at a data lake platform where the first synthetic data sample undergoes an extract, transform, and load (ETL) process before being ingested by the target system. 6. The system of claim 1 , further comprising: converting the first synthetic data sample from a first format to a second format; and sending, to the target system, the first synthetic data sample in the second format. 7. The system of claim 6 , wherein the first format comprises a data-interchange format, and wherein the second format comprises a column-oriented data storage format. 8. The system of claim 6 , wherein the first format comprises a JavaScript Object Notation (JSON) and/or an Extensible Markup Language (XML), and wherein the second format comprises Parquet. 9. The system of claim 1 , wherein the task includes reporting, visualization, analytics, and/or machine learning. 10. The system of claim 1 , wherein the first synthetic data sample is used to train a machine learning model to perform the task at the target system. 11. The system of claim 1 , wherein the first synthetically generated value is a randomly generated value. 12. A computer-implemented method, comprising: identifying an identifier field included in a first datatype of a seed data sample associated with a source system, the identifier field storing a first value that enables a differentiation between different instances of the first datatype, wherein the seed data sample comprises data that occurs at the source system; identifying a relationship field included in the first datatype of the seed data sample, the relationship field storing a second value that defines a relationship between the first datatype of the seed data sample and a second datatype; generating, based at least on the seed data sample, a first synthetic data sample, the generating includes populating the identifier field of the first synthetic data sample with a first synthetically generated value and the relationship field of the first synthetic data sample with the second value, wherein the first synthetic data sample is generated to supplement a volume and a diversity of the data at the source system; and sending, to a target system, the first synthetic data sample to enable a performance of a task at the target system; in response to determining that the first datatype is a parent datatype to a second datatype, propagating, to a second synthetic data sample of the second datatype, a change corresponding to the first synthetically generated value populating the identifier field of the first synthetic data sample. 13. The method of claim 12 , further comprising: in response to determining that the first datatype is a child datatype of a second datatype, propagating, to the identifier field of the first synthetic data sample, a change corresponding to a second synthetically generated value populating the identifier field of a second synthetic data sample of the second datatype. 14. The method of claim 12 , wherein the first synthetic data sample is sent to the target system by at least pushing the first synthetic data sample to an event stream providing a constant flow of data to the target system. 15. The method of claim 12 , wherein the first synthetic data sample is a variation of the seed data sample that is different from the seed data sample but retains a same dependency to other datatypes. 16. The method of claim 12 , wherein the first synthetic data sample is sent to the target system by at least pushing the first synthetic data sample to a raw data store at a data lake platform where the first synthetic data sample undergoes an extract, transform, and load (ETL) process before being ingested by the target system. 17. The method of claim 12 , further comprising: converting the first synthetic data sample from a first format to a second format, the first format comprising a data-interchange format and the second format comprising a column-oriented data storage format; and sending, to the target system, the first synthetic data sample in the second format. 18. A non-transitory computer readable medium storing instructions, which when executed by at least one data processor, result in operations comprising: identifying an identifier field included in a first datatype of a seed data sample associated with a source system, the identifier field storing a first value that enables a differentiation between different instances of the first datatype, wherein the seed data sample comprises data that occurs at the source system; identifying a relationship field included in the first datatype of the seed data sample, the relationship field storing a second value that defines a relationship between the first datatype of the seed data sample and a second datatype; generating, based at least on the seed data sample, a first synthetic data sample, the generating includes populating the identifier field of the first synthetic data sample with a first synthetically generated value and the relationship field of the first synthetic data sample with the

Assignees

Sap Se

Inventors

Classifications

G06F16/2365
Ensuring data consistency and integrity · CPC title
G06F9/4881Primary
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
G06F16/88Primary
Mark-up to mark-up conversion (conversion for visualization in web browsing G06F16/9577) · CPC title

Patent family

Related publications grouped by family.

View patent family 84541059

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12073246B2 cover?: A method may include identifying an identifier field included in a first datatype of a seed data sample associated with a source system. The identifier field may store a first value that enables a differentiation between different instances of the first datatype. A relationship field, which stores a second value that define a relationship between the first datatype and a second data type, may b…
Who is the assignee on this patent?: Sap Se
What technology area does this patent fall under?: Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Data augmentation for threat investigation in an enterprise network

Columnar storage and processing of unstructured data

System and method for importation of configuration item (ci) data into a configuration management database (cmdb)

System and method for compact tree representation for machine learning

Distributed file system with tenant file system entity

Methods for improved data store migrations and devices thereof

System and method for identifying relationships in a data graph

Frequently asked questions