Conditioned synthetic data generation in computer-based reasoning systems

US11669769B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11669769-B2
Application numberUS-202017006144-A
CountryUS
Kind codeB2
Filing dateAug 28, 2020
Priority dateDec 13, 2018
Publication dateJun 6, 2023
Grant dateJun 6, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for synthetic data generation in computer-based reasoning systems are discussed and include receiving a request for generation of synthetic data based on a set of training data cases. One or more focal training data cases are determined. For undetermined features (either all of them or those that are not subject to conditions), a value for the feature is determined based on the focal cases. In some embodiments, the generation of synthetic data may be conditioned on values of features, preserved features, such as unique identifiers, previous-in-time features, and using the other techniques discussed herein.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a request for generation of synthetic data based on a set of training data cases; determining one or more conditions for the synthetic data; for each synthetic data case in the synthetic data, for each undetermined feature in the synthetic data case, determining one or more focal training data cases from among the set of training data cases based at least in part on the one or more conditions and any already-determined value for features in the synthetic data case; determining a value for the undetermined feature in the synthetic data case based at least in part on the focal training data cases; using the value for the undetermined feature in the synthetic data case; continuing to determine undetermined features until there are no more undetermined features; causing control of a controllable system using a computer-based reasoning model that was determined at least in part based on the synthetic data cases in the synthetic data; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , wherein determining the one or more focal training data cases comprises determining the one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the synthetic data case and on a previous-in-time value for the undetermined feature. 3. The method of claim 1 , wherein determining the one or more focal training data cases comprises determining the one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the synthetic data case, and on a value for a feature in the synthetic data case other than the undetermined feature. 4. The method of claim 1 , wherein determining the one or more focal training data cases comprises determining the one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the synthetic data case, and on one or more previous-in-time values for the undetermined feature and a value for a feature in the synthetic data case. 5. The method of claim 1 , wherein determining the one or more focal training data cases comprises determining the one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the synthetic data case, and on two or more previous-in-time values for the undetermined feature. 6. The method of claim 1 , further comprising determining a unique identifier from a table the set of training data cases and wherein determining the one or more focal training data cases comprises determining the one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the synthetic data case, and on the unique identifier. 7. The method of claim 1 , further comprising determining one or more preserved feature values from the set of training data cases and wherein determining the one or more focal training data cases comprises determining the one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the synthetic data case, and on the one or more preserved feature values. 8. The method of claim 7 , further comprising: determining a set of links among database tables from the set of training cases, wherein the links represent overlap of corresponding values from database table to database table; and determining the one or more conditions at least in part based on the set of links among the database tables. 9. The method of claim 7 , further comprising: for each undetermined feature in a second synthetic data case, determining a second set of one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the second synthetic data case, and on the one or more preserved feature values; determining a value for the undetermined feature in the second synthetic data case based at least in part on the second set of one or more focal training data cases; using the value for the undetermined feature in the second synthetic data case; continue to determine undetermined features for the second synthetic data case until there are no more undetermined features. 10. The method of claim 1 , further comprising determining a unique identifier from a table in the set of training data cases and wherein determining the one or more focal training data cases comprises determining the one or more focal training data cases from among the set of training data cases conditioned at least in part on any already-determined value for features in the synthetic data case, on the unique identifier, and a value for a feature in the synthetic data case. 11. The method of claim 1 , wherein determining the value for the undetermined feature in the synthetic data case comprises: determining the value for the undetermined feature in the synthetic data case based at least in part on a distribution associated with the undetermined feature in the focal training data cases. 12. The method of claim 1 , further comprising: determining a fitness score for the synthetic data case; when the fitness score for the synthetic data case is beyond a particular threshold, using the synthetic data case as synthetic data. 13. The method of claim 1 , further comprising: determining a shortest distance between the synthetic data case and cases in the set of training data cases; when the shortest distance between the synthetic data case and the cases in the set of training data cases is beyond a particular threshold, using the synthetic data case as synthetic data. 14. The method of claim 1 , further comprising: determining distances between the synthetic data case and at least two cases in the set of training data cases; determining whether there are at least a certain number (k) of training data cases that have a distance to the synthetic data case that is below a threshold; when there are at least k training data cases have distances to the synthetic data case that are below the threshold, using the synthetic data case as synthetic data. 15. The method of claim 1 , further comprising: determining distances between the synthetic data case and at least two cases in the set of training data cases; determining whether there are at least a certain number (k) of training data cases that have a distance to the synthetic data case that is below a first threshold; determining whether any of the set of training data cases have a distance below a second threshold; when there are at least k training data cases have distances to the synthetic data case that are below the first threshold and no training data case has a distance to the synthetic data case that is below the second threshold, using the synthetic data case as synthetic data. 16. A non-transitory computer readable medium storing instructions which, when executed by one or more computing devices, cause the one or more computing devices to perform a process of: receiving a request for generation of synthetic data based on a set of training data cases; determining one or more conditions for the synthetic data; for each synthetic data case in the synthetic data, for each undetermined feature in the synthetic data case, determ

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • G06N5/04Primary

    Inference or reasoning models · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Reinforcement learning · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11669769B2 cover?
Techniques for synthetic data generation in computer-based reasoning systems are discussed and include receiving a request for generation of synthetic data based on a set of training data cases. One or more focal training data cases are determined. For undetermined features (either all of them or those that are not subject to conditions), a value for the feature is determined based on the focal…
Who is the assignee on this patent?
Diveplane Corp
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).