Synthetic data generation of time series data

US10133949B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10133949-B2
Application numberUS-201715651219-A
CountryUS
Kind codeB2
Filing dateJul 17, 2017
Priority dateJul 15, 2016
Publication dateNov 20, 2018
Grant dateNov 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of generating synthetic data from time series data, such as from handwritten characters, words, sentences, mathematics, and sketches that are drawn with a stylus on an interactive display or with a finger on a touch device. This computationally efficient method is able to generate realistic variations of a given sample. In a handwriting or sketch recognition context, synthetic data is generated from real data in order to train recognizers and thus improve recognition accuracy when only a limited number of samples are available. Similarly, synthetic data can also be used to test and validate such recognizers. Also discussed is a dynamic time warping based approach for both segmented and continuous data that is designed to be a robust, go-to method for gesture recognition across a variety of modalities using only limited training samples.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating a synthetic variant of a given input for an application, the method comprising the steps of: receiving to and recording on a computer-readable storage device an input time series of K points; stochastically resampling the time series of K points into a first set of n points along the series' path, whereby a path distance between the n points is non-uniform; extracting and normalizing a direction vector between each consecutive pair of points to a unit length, wherein the direction vector between the each consecutive pair of points is lengthened or shortened as a result of the resampling and normalization; concatenating a resulting plurality of normalized direction vectors to create a second set of n points, wherein the origin of the first vector in the series is at the center of a coordinate system, wherein the resulting series forms a shape that can be translated, scaled, skewed and rotated as necessary; and outputting the synthetic variant based on the formed shape, whereby perturbations are simulated directly on the given input itself, wherein the method is repeated to generate a plurality of synthetic variants and wherein the application is a gesture recognizer, the method further comprising: generating a synthetic samples distribution based on the plurality of synthetic variants; and training the gesture recognizer with the synthetic samples distribution. 2. The method of claim 1 , further comprising the step of extracting a predetermined set of statistical features from the input time series of K points prior to resampling, wherein a cardinality of the first set of n points is established based on statistical features of the input time series of K points. 3. The method of claim 2 , wherein the predetermined set of statistical features of the input time series relate to density and closedness of the K points within the input time series. 4. The method of claim 3 , wherein a cardinality of the first set of n points is an optimal n value based on the following equation: n =exp{1.67+0.29 density+1.42 closedness}. 5. The method of claim 4 , wherein the optimal n value is from about 16 sampling points to about 69 sampling points. 6. The method of claim 1 , further comprising randomly selecting a subset of n points from the first set of n points along the path defined by the time series, wherein the step of random selection occurs prior to extracting and normalizing the direction vector. 7. The method of claim 6 , further comprising removing the randomly selected subset of n points from the first set of n points, wherein the remaining unremoved n points from the first set of n points undergo the step of extraction and normalization. 8. The method of claim 1 , wherein the step of extracting and normalizing the direction vector between the each consecutive pair of points and concatenating the plurality of normalized direction vectors is performed by generating a synthetic stroke that can be scaled, translated, rotated, and smoothed as necessary. 9. The method of claim 1 , further comprising smoothing the synthetic variant. 10. The method of claim 1 , wherein the given input is a multistroke gesture including a plurality of strokes, the method further comprising: randomly permuting the plurality of strokes and reversing a random subset of the plurality of strokes; combining the plurality of strokes together into the time series of K points prior to resampling; and discarding over-the-air points, resulting in the synthetic variant being a synthetic multistroke gesture that is generated, wherein the step of discarding the over-the-air points occurs after the normalization and concatenation steps. 11. The method of claim 1 , wherein the method is repeated to generate a plurality of synthetic variants, the method further comprising: measuring the plurality of synthetic variants against the given input; generating a synthetic in-class measurements probability distribution from the measurements, wherein the synthetic in-class measurements probability distribution is based on input samples; generating an out-of-class measurements probability distribution from out-of-class measurements, wherein the out-of-class measurements probability distribution is based on synthetic non-input samples; establishing a rejection threshold based on the synthetic in-class measurements probability distribution and the out-of-class measurements probability distribution, wherein the rejection threshold minimizes the probability of false negative errors and false positive errors. 12. The method of claim 11 , wherein the step of establishing the rejection threshold is performed by: selecting an objective function to be maximized; estimating a value of the objective function using the synthetic in-class measurements probability distribution and the out-of-class measurements probability distribution, wherein the estimate is made at each point along a range of measurement values of the combined synthetic in-class and out-of-class measurements probability distributions; determining a measurement value that maximizes the objective function based on the estimated value of the objective function, wherein the determined measurement value is the rejection threshold. 13. The method of claim 11 , wherein the step of measuring the plurality of synthetic variants against the given input is performed via 1-nearest neighbor classification. 14. The method of claim 11 , wherein the synthetic in-class input samples are generated by stochastically resampling the time series of K points and normalizing the direction vector between each consecutive pair of points. 15. The method of claim 1 , wherein the method is repeated to generate a plurality of synthetic variants and wherein the application is image generation, the method further comprising stochastically resampling each stroke to generate a sketched image. 16. The method of claim 1 , wherein: the given input is an input image, edges are extracted from the input image, the application is image variant generation, wherein the edges within the input image are stochastically resampled to generate a synthetic, non-photorealistic variant of the input image. 17. One or more tangible non-transitory computer-readable media having computer-executable instructions for performing a method of running a software program on a computing device, the computing device operating under an operating system, the method including issuing instructions from the software program to generate a synthetic variant of a given input for an application, the instructions comprising: receiving to and recording on a computer-readable storage device an input time series of K points; stochastically resampling the time series of K points into a first set of n points along the series' path, whereby a path distance between the n points is non-uniform; extracting and normalizing a direction vector between each consecutive pair of points to a unit length, wherein the direction vector between the each consecutive pair of points is lengthened or shortened as a result of the resampling and normalization; concatenating a plurality of normalized direction vectors to create a second set of n points, wherein the origin of the first vector in the series is at the center of a coordinate system, wherein the resulting series forms a shape that can be translated, scaled, skewed and rotated as necessary; and outputting the synthetic variant based on the formed shape, whereby perturbations are simulated directly on the given input itself, wherein the m

Assignees

Inventors

Classifications

  • for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • G06K9/222Primary

    Physics · mapped topic

  • for inputting data by handwriting, e.g. gesture or text · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • the instrument generating sequences of position coordinates corresponding to handwriting (preprocessing or recognising digital ink G06V30/32) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10133949B2 cover?
A method of generating synthetic data from time series data, such as from handwritten characters, words, sentences, mathematics, and sketches that are drawn with a stylus on an interactive display or with a finger on a touch device. This computationally efficient method is able to generate realistic variations of a given sample. In a handwriting or sketch recognition context, synthetic data is …
Who is the assignee on this patent?
Univ Central Florida Res Found Inc
What technology area does this patent fall under?
Primary CPC classification G06K9/222. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).