Generating synthetic data

US9785719B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9785719-B2
Application numberUS-201414332147-A
CountryUS
Kind codeB2
Filing dateJul 15, 2014
Priority dateJul 15, 2014
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods for generating synthetic data based on time dependent data with increased accuracy include decomposing a base dataset into a base dynamic component and at least one static component. Decomposing the base dataset includes applying a decomposition model to the base dataset. One or more embodiments generate a synthetic dynamic component based on the base dynamic component. One or more embodiments merge the synthetic dynamic component with the at least one static component to generate a synthetic dataset having at least some of the time dependent characteristics of the base dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating synthetic data, comprising: identifying, by at least one processor, a base dataset describing a set of events corresponding to a time period; decomposing, by the at least one processor, the base dataset into a base dynamic component and a trend component by applying a decomposition model to the base dataset, the base dynamic component comprising a plurality of data points that do not correspond to any time-dependent characteristics of the base dataset and the trend component comprising data points with a time-dependent characteristic that indicates a trend associated with the base dataset; generating, by the at least one processor, a synthetic dynamic component by randomly resampling the data points from the base dynamic component to create a plurality of synthetic data points that do not correspond to the time-dependent characteristics of the base dataset; generating, by the at least one processor, a synthetic dataset comprising new data points with the time-dependent characteristic that indicates the trend associated with the base dataset by combining the synthetic data points in the synthetic dynamic component with the data points in the trend component; and predicting, by the at least one processor using the generated synthetic dataset, a future dataset for the base dataset, the predicted future dataset comprising a plurality of predicted data points according to the time-dependent characteristic. 2. The method as recited in claim 1 , wherein generating the synthetic dynamic component comprises randomly resampling data points from the base dynamic component to create the plurality of synthetic data points according to a distribution constraint on the base dynamic component. 3. The method as recited in claim 2 , wherein generating the synthetic dynamic component further comprises: determining a normal distribution for the plurality of data points in the base dynamic component; and randomly resampling data points from the base dynamic component according to the normal distribution of the plurality of data points in the base dynamic component to create the synthetic data points. 4. The method as recited in claim 3 , wherein generating the synthetic dynamic component further comprises randomly resampling the data points of the base dynamic component according to a three-sigma rule for the normal distribution of the plurality of data points in the base dynamic component. 5. The method as recited in claim 1 , wherein decomposing the base dataset further comprises decomposing the base dataset into a seasonal component comprising data points with a seasonal characteristic indicating a seasonal effect associated with the set of events. 6. The method as recited in claim 5 , wherein generating the synthetic dataset further comprises combining the data points in the synthetic dynamic component with the data points in the trend component and the data points in the seasonal component to create the new data points that include the trend from the trend component and seasonal information from the seasonal component. 7. The method as recited in claim 1 , wherein decomposing the base dataset into a plurality of components comprises decomposing the base dataset according to an additive decomposition model. 8. The method as recited in claim 1 , further comprising: providing, in a graphical user interface, a graph comprising a plurality of data points in the synthetic dynamic component; receiving, in the graphical user interface, a selection of a particular data point from the plurality of data points; moving the particular data point from a first location in the graph to a second location in the graph to introduce an anomaly into the synthetic dynamic component; and merging the synthetic dynamic component comprising the anomaly with the trend component to introduce the anomaly into the generated synthetic dataset. 9. The method as recited in claim 1 , wherein decomposing the base dataset comprises smoothing the base dataset using an exponential moving average algorithm to isolate the trend component from the base dataset. 10. The method as recited in claim 9 , wherein decomposing the base dataset further comprises removing the trend component from the base dataset to obtain the base dynamic component. 11. A method of generating synthetic data, comprising: identifying, by at least one processor, a base dataset comprising a plurality of data points with at least one time-dependent characteristic; determining, by the at least one processor, a first dynamic component and at least one static component from the base dataset by decomposing the base dataset, the first dynamic component comprising a plurality of data points that do not correspond to any time-dependent characteristics of the base dataset, and the at least one static component comprising a time-dependent characteristic; generating, by the at least one processor, a second dynamic component by randomly resampling the data points from the first dynamic component to create a plurality of synthetic data points that do not correspond to the time-dependent characteristics of the base dataset; generating, by the at least one processor, a synthetic dataset comprising new data points with the time-dependent characteristic by combining the synthetic data points in the second dynamic component with the data points in the at least one static component; and predicting, by the at least one processor using the generated synthetic dataset, a future dataset for the base dataset, the predicted future dataset comprising a plurality of predicted data points according to the time-dependent characteristic. 12. The method as recited in claim 11 , wherein the time-dependent characteristic of the at least one static component comprises a trend component describing a trend associated with the base dataset. 13. The method as recited in claim 12 , wherein the at least one static component further comprises a seasonal component comprising data points with a time-dependent characteristic that describes a seasonal effect associated with the base dataset. 14. The method as recited in claim 13 , wherein generating the synthetic dataset comprises merging the synthetic data points in the second dynamic component with the data points in the trend component and the data points in the seasonal component of the base dataset to create new data points that include the trend from the trend component and the seasonal effect from the seasonal component. 15. The method as recited in claim 13 , wherein determining, by the at least one processor, the first dynamic component comprises subtracting the trend component and the seasonal component from the base dataset. 16. The method as recited in claim 11 , wherein generating the second dynamic component comprises randomly resampling data points of the first dynamic component to create the plurality of synthetic data points according to a distribution constraint on the first dynamic component. 17. The method as recited in claim 15 , wherein generating the second dynamic component further comprises randomly resampling the data points of the first dynamic component according to a three-sigma rule. 18. A system for generating synthetic data, comprising: at least one processor: at least one non-transitory computer readable storage medium storing instructions thereon, that, when executed by the at least one processor, cause the system to: identify a base dataset describing a set of events corresponding to a time period; decompose the base dataset into a base dynamic

Assignees

Inventors

Classifications

  • G06F16/955Primary

    using information identifiers, e.g. uniform resource locators [URL] · CPC title

  • Commerce · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9785719B2 cover?
Methods for generating synthetic data based on time dependent data with increased accuracy include decomposing a base dataset into a base dynamic component and at least one static component. Decomposing the base dataset includes applying a decomposition model to the base dataset. One or more embodiments generate a synthetic dynamic component based on the base dynamic component. One or more embo…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/955. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).