Systems and methods for synthetic data generation for time-series data using data segments

US11822975B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11822975-B2
Application numberUS-202017102526-A
CountryUS
Kind codeB2
Filing dateNov 24, 2020
Priority dateJul 6, 2018
Publication dateNov 21, 2023
Grant dateNov 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating synthetic data are disclosed. For example, a system may include one or more memory units storing instructions and one or more processors configured to execute the instructions to perform operations. The operations may include receiving a dataset including time-series data. The operations may include generating a plurality of data segments based on the dataset, determining respective segment parameters of the data segments, and determining respective distribution measures of the data segments. The operations may include training a parameter model to generate synthetic segment parameters. Training the parameter model may be based on the segment parameters. The operations may include training a distribution model to generate synthetic data segments. Training the distribution model may be based on the distribution measures and the segment parameters. The operations may include generating a synthetic dataset using the parameter model and the distribution model and storing the synthetic dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for generating synthetic data, comprising: one or more memory units storing instructions; and one or more processors that execute the instructions to perform operations comprising: receiving a request to generate a synthetic time-series dataset, the request including a request dataset; determining a profile of the request dataset; accessing a distribution model based on the determined profile of the request dataset, the distribution model having been trained to generate synthetic data segments based on distribution measures and segment parameters of actual time-series data, wherein the generated synthetic data segments satisfy a similarity metric representing a measure of similarity between the synthetic data segments and the actual time-series data; and generating, according to the distribution model, a synthetic time-series dataset. 2. The system of claim 1 , wherein: the operations further comprise generating synthetic segment parameters using a parameter model; and generating the synthetic time-series dataset comprises: generating synthetic data segments according to the distribution model; and combining the synthetic data segments to generate the synthetic time-series dataset. 3. The system of claim 2 , wherein combining the synthetic data segments comprises combining the synthetic data segments in two or more dimensions. 4. The system of claim 2 , the parameter model having been trained to generate synthetic segment parameters and segment sizes. 5. The system of claim 2 , wherein generating synthetic segment parameters using a parameter model comprises generating a sequence of synthetic segment parameters based on at least one of a segment parameter seed or an instruction to generate a random parameter seed. 6. The system of claim 5 , wherein the sequence of synthetic segment parameters extends forward or backward in time from the segment parameter seed or the random parameter seed. 7. The system of claim 1 , the operations further comprising searching a model index based on the profile of the request dataset to determine the distribution model. 8. The system of claim 7 , wherein searching the model index comprises searching the model index based on at least one of a model parameter, a model hyperparameter, or a model type. 9. The system of claim 7 , wherein: the request includes at least one of a data schema or a statistical metric; and the distribution model is determined based on the distribution model having at least one of a model data schema overlapping with the data schema or a model statistical metric within a tolerance of the statistical metric. 10. The system of claim 1 , the operations further comprising providing the synthetic time-series dataset at least one of a component within the system or a component outside the system. 11. The system of claim 1 , wherein determining the profile of the request dataset comprises retrieving the profile from the request. 12. The system of claim 1 , wherein determining the profile of the request dataset comprises accessing a storage location identified in the request. 13. The system of claim 1 , wherein the profile of the request dataset includes at least one of a number of dataset dimensions or a dataset format. 14. A method for generating synthetic data, the method comprising: receiving a request to generate a synthetic time-series dataset, the request including a request dataset; determining a profile of the request dataset; searching a model index based on the profile of the request dataset to determine a model; accessing a distribution model, the distribution model having been trained to generate synthetic data segments based on distribution measures and segment parameters of actual time-series data, wherein the generated synthetic data segments satisfy a similarity metric representing a measure of similarity between the synthetic data segments and the actual time-series data; generating, using the distribution model, synthetic data segments based on synthetic segment parameters of the request dataset; and generating, using the synthetic data segments, a synthetic time-series dataset. 15. The method of claim 14 , further comprising generating synthetic segment parameters using a parameter model, wherein generating the synthetic time-series dataset comprises: generating synthetic data segments according to the distribution model; and combining the synthetic data segments to generate the synthetic time-series dataset. 16. The method of claim 15 , wherein combining the synthetic data segments comprises combining the synthetic data segments in two or more dimensions. 17. The method of claim 15 , the parameter model having been trained to generate synthetic segment parameters and segment sizes. 18. The method of claim 15 , wherein generating synthetic segment parameters using a parameter model comprises generating a sequence of synthetic segment parameters based on at least one of a segment parameter seed or an instruction to generate a random parameter seed. 19. The method of claim 18 , wherein the sequence of synthetic segment parameters extends forward or backward in time from the segment parameter seed or the random parameter seed. 20. The method of claim 14 , wherein the profile of the request dataset includes at least one of a number of dataset dimensions or a dataset format.

Assignees

Inventors

Classifications

  • Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • Supervised learning · CPC title

  • Adversarial learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11822975B2 cover?
Systems and methods for generating synthetic data are disclosed. For example, a system may include one or more memory units storing instructions and one or more processors configured to execute the instructions to perform operations. The operations may include receiving a dataset including time-series data. The operations may include generating a plurality of data segments based on the dataset,…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/541. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).