What technology area does this patent fall under?

Primary CPC classification G06V10/774. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jan 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Regional Model Residuals in Synthetic Data Generation in Computer-Based Reasoning Systems

US2023024796A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2023024796-A1
Application number	US-202217859719-A
Country	US
Kind code	A1
Filing date	Jul 7, 2022
Priority date	Jul 9, 2021
Publication date	Jan 26, 2023
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for synthetic data generation in computer-based reasoning systems are discussed and include receiving a request for generation of synthetic data based on a set of training data cases. One or more focal training data cases are determined. For undetermined features (either all of them or those that are not subject to conditions), a value for the feature is determined based on the focal cases. In some embodiments, the generated synthetic data may be checked for similarity against the training data, and if similarity conditions are met, it may be modified (e.g., resampled), removed, and/or replaced.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving a request for generation of synthetic data based on a set of training data cases; for each synthetic data case in the synthetic data, determining a first undetermined feature in the synthetic data case based at least in part on an approximation of a residual, determining subsequent undetermined features in the synthetic data case based at least in part on an approximation of a residual, causing control of a controllable system using a computer-based reasoning model that was determined at least in part based on the synthetic data cases in the synthetic data; wherein the method is performed by one or more computing devices. 2 . The method of claim 1 , wherein determining one or more focal training data cases from among the set of training data cases based at least in part on the one or more conditions comprises: determining one or more focal training data cases from among the set of training data cases based at least in part on identifier contribution allocation. 3 . The method of claim 2 , further comprising determining the identifier contribution allocation comprises based at least in part on a function of an aggregate identifier contribution allocation for each value of an associated identifier and a number of occurrences of each value of the identifier. 4 . The method of claim 3 , further comprising determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on setting an identical aggregate identifier contribution allocation for each value of the identifier. 5 . The method of claim 3 , further comprising determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on setting a random aggregate identifier contribution allocation for each value of the identifier. 6 . The method of claim 3 , further comprising determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on a function of a total number of cases for each value of the identifier and a total number of cases for the identifier. 7 . The method of claim 3 , further comprising determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on setting a received aggregate identifier contribution allocation for each value of the identifier. 8 . The method of claim 1 , wherein determining one or more focal training data cases from among the set of training data cases based at least in part on the one or more conditions comprises: determining one or more focal training data cases from among the set of training data cases based at least in part on two or more identifier contribution allocations. 9 . The method of claim 1 , wherein determining one or more focal training data cases from among the set of training data cases based at least in part on the value for the first undetermined feature and any previously-determined values for subsequent undetermined features comprises: determining the one or more focal training data cases from among the set of training data cases based at least in part on the value for the first undetermined feature and any previously-determined values for subsequent undetermined features and the one or more conditions. 10 . A system for performing a machine-executed operation involving instructions, wherein said instructions are instructions which, when executed by one or more computing devices, cause performance of a method comprising: receiving a request for generation of synthetic data based on a set of training data cases; for each synthetic data case in the synthetic data, determining a first undetermined feature in the synthetic data case based at least in part on an approximation of a residual, determining subsequent undetermined features in the synthetic data case based at least in part on an approximation of a residual, causing control of a controllable system using a computer-based reasoning model that was determined at least in part based on the synthetic data cases in the synthetic data; wherein the method is performed by one or more computing devices. 11 . The system of claim 10 , wherein determining one or more focal training data cases from among the set of training data cases based at least in part on the one or more conditions comprises: determining one or more focal training data cases from among the set of training data cases based at least in part on identifier contribution allocation. 12 . The system of claim 11 , wherein the method further comprises determining the identifier contribution allocation comprises based at least in part on a function of an aggregate identifier contribution allocation for each value of an associated identifier and a number of occurrences of each value of the identifier. 13 . The system of claim 12 , wherein the method further comprises determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on setting an identical aggregate identifier contribution allocation for each value of the identifier. 14 . The system of claim 12 , wherein the method further comprises determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on setting a random aggregate identifier contribution allocation for each value of the identifier. 15 . The system of claim 12 , wherein the method further comprises determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on a function of a total number of cases for each value of the identifier and a total number of cases for the identifier. 16 . A non-transitory computer readable medium storing instructions which, when executed by one or more computing devices, cause the one or more computing devices to perform a method of: receiving a request for generation of synthetic data based on a set of training data cases; for each synthetic data case in the synthetic data, determining a first undetermined feature in the synthetic data case based at least in part on an approximation of a residual, determining subsequent undetermined features in the synthetic data case based at least in part on an approximation of a residual, causing control of a controllable system using a computer-based reasoning model that was determined at least in part based on the synthetic data cases in the synthetic data; wherein the method is performed by one or more computing devices. 17 . The non-transitory computer readable medium of claim 16 , wherein determining one or more focal training data cases from among the set of training data cases based at least in part on the one or more conditions comprises: determining one or more focal training data cases from among the set of training data cases based at least in part on identifier contribution allocation. 18 . The non-transitory computer readable medium of claim 17 , wherein the method further comprises determining the identifier contribution allocation comprises based at least in part on a function of an aggregate identifier contribution allocation for each value of an associated identifier and a number of occurrences of each value of the identifier. 19 . The non-transitory computer readable medium of claim 18 , wherein the method further comprises determining the aggregate identifier contribution allocation for each value of the identifier based at least in part on

Assignees

Diveplane Corp

Inventors

Classifications

G06V10/774Primary
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06N5/04Primary
Inference or reasoning models · CPC title
G06V10/776
Validation; Performance evaluation · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 84976652

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023024796A1 cover?: Techniques for synthetic data generation in computer-based reasoning systems are discussed and include receiving a request for generation of synthetic data based on a set of training data cases. One or more focal training data cases are determined. For undetermined features (either all of them or those that are not subject to conditions), a value for the feature is determined based on the focal…
Who is the assignee on this patent?: Diveplane Corp
What technology area does this patent fall under?: Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jan 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).