Systems and methods for synthetic data generation for time-series data using data segments

US12379977B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12379977-B2
Application numberUS-202318360482-A
CountryUS
Kind codeB2
Filing dateJul 27, 2023
Priority dateJul 6, 2018
Publication dateAug 5, 2025
Grant dateAug 5, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating synthetic data are disclosed. For example, a system may include one or more memory units storing instructions and one or more processors configured to execute the instructions to perform operations. The operations may include receiving a dataset including time-series data. The operations may include generating a plurality of data segments based on the dataset, determining respective segment parameters of the data segments, and determining respective distribution measures of the data segments. The operations may include training a parameter model to generate synthetic segment parameters. Training the parameter model may be based on the segment parameters. The operations may include training a distribution model to generate synthetic data segments. Training the distribution model may be based on the distribution measures and the segment parameters. The operations may include generating a synthetic dataset using the parameter model and the distribution model and storing the synthetic dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for facilitating realistic synthetic time-series data generation via time-scale-based distribution measures, comprising: one or more processors and one or more memory units storing instructions that, when executed by the one or more processors, perform operations comprising: storing, in a network database, reference time-series data segments comprising reference subsets of reference data segments that respectively correspond to different time scales; during training of a machine learning model, executing, via a network, the machine learning model to generate synthetic time-series data segments comprising synthetic subsets of synthetic data segments that respectively correspond to the different time scales; with respect to a first time scale of the different time scales, loading, from a network database, a first reference subset of the reference time-series data segments that corresponds to the first time scale in connection with autocorrelation of a first synthetic subset of the synthetic time-series data segments that corresponds to the first time scale; and based on a comparison of an autocorrelation of (i) a reference distribution measure associated with the first reference subset of the reference time-series data segments that corresponds to the first time scale and (ii) a synthetic distribution measure associated with the first synthetic subset of the synthetic time-series data segments that corresponds the first time scale, performing (i) updating of the machine learning model in connection with the training of the machine learning model or (ii) termination of the training of the machine learning model. 2. A method for generating synthetic data, the method comprising: storing, in one or more databases, reference time-series data segments; during training of a machine learning model, executing the machine learning model to generate synthetic time-series data segments; with respect to a first time scale, obtaining a first reference subset of the reference time-series data segments that corresponds to the first time scale in connection with autocorrelation of a first synthetic subset of the synthetic time-series data segments that corresponds to the first time scale; and based on a comparison of an autocorrelation of (i) a reference distribution measure associated with the first reference subset of the reference time-series data segments and (ii) a synthetic distribution measure associated with the first synthetic subset of the synthetic time-series data segments, performing (i) updating of the machine learning model in connection with the training of the machine learning model or (ii) termination of the training of the machine learning model. 3. The method of claim 2 , wherein performing the updating of the machine learning model or the termination of the training of the machine learning model comprises performing the updating of the machine learning model or the termination of the training of the machine learning model based on a given comparison with respect to a regression of a time-based function applied to the first reference subset of the reference time-series data segments and a regression of a time-based function applied to the first synthetic subset of the synthetic time-series data segments. 4. The method of claim 2 , wherein performing the updating of the machine learning model or the termination of the training of the machine learning model comprises performing the updating of the machine learning model or the termination of the training of the machine learning model based on a performance metric derived from the comparison. 5. The method of claim 2 , wherein performing the updating of the machine learning model or the termination of the training of the machine learning model comprises performing the updating of the machine learning model or the termination of the training of the machine learning model based on a similarity metric derived from the comparison. 6. The method of claim 2 , further comprising generating a synthetic dataset by combining sequences of the synthetic time-series data segments. 7. The method of claim 2 , wherein the reference distribution measure comprises at least one of a normalized distribution, a gaussian distribution, a Bernoulli distribution, a binomial distribution, a normal distribution, a Poisson distribution, or an exponential distribution. 8. The method of claim 2 , wherein generating the synthetic time-series data segments comprises, during the training of the machine learning model, executing the machine learning model to generate (i) first synthetic time-series data segments that corresponds to the first time scale and (ii) second synthetic time-series data segments that correspond to a second time scale different from the first time scale. 9. The method of claim 2 , wherein generating the synthetic time-series data segments comprises, during the training of the machine learning model, executing the machine learning model to generate synthetic three-dimensional time-series spatial data that corresponds to the first time scale, and wherein performing the updating of the machine learning model or the termination of the training of the machine learning model comprises performing the updating of the machine learning model or the termination of the training of the machine learning model based on a given comparison of the reference distribution measure and a given synthetic distribution measure associated with the synthetic three-dimensional time-series spatial data that corresponds to the first time scale. 10. One or more non-transitory computer-readable media comprising instructions that, when executed by one or more processors, causes operations comprising: storing, in one or more databases, reference time-series data segments; during training of a machine learning model, executing the machine learning model to generate synthetic time-series data segments; with respect to a first time scale, obtaining a first reference subset of the reference time-series data segments that corresponds to the first time scale in connection with autocorrelation of a first synthetic subset of the synthetic time-series data segments that corresponds to the first time scale; and based on a comparison of an autocorrelation of (i) a reference distribution measure associated with the first reference subset of the reference time-series data segments and (ii) a synthetic distribution measure associated with the first synthetic subset of the synthetic time-series data segments, performing (i) updating of the machine learning model in connection with the training of the machine learning model or (ii) termination of the training of the machine learning model. 11. The one or more non-transitory computer-readable media of claim 10 , wherein performing the updating of the machine learning model or the termination of the training of the machine learning model comprises performing the updating of the machine learning model or the termination of the training of the machine learning model based on a given comparison with respect to a regression of a time-based function applied to the first reference subset of the reference time-series data segments and a regression of a time-based function applied to the first synthetic subset of the synthetic time-series data segments. 12. The one or more non-transitory computer-readable media of claim 10 , wherein performing the updating of the machine learning model or the termination of the training of the machine learning model comprises performing the updating of the machine learning model or the termination of the training of the machine learning model based on a performance metric derived f

Assignees

Inventors

Classifications

  • Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • Supervised learning · CPC title

  • Adversarial learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12379977B2 cover?
Systems and methods for generating synthetic data are disclosed. For example, a system may include one or more memory units storing instructions and one or more processors configured to execute the instructions to perform operations. The operations may include receiving a dataset including time-series data. The operations may include generating a plurality of data segments based on the dataset,…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/541. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 05 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).