Tabular data generation

US2025124220A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025124220-A1
Application numberUS-202418911044-A
CountryUS
Kind codeA1
Filing dateOct 9, 2024
Priority dateOct 11, 2023
Publication dateApr 17, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A tabular data model, which may be pre-trained on a different data set, is used to generate data samples for a target class with a given set of context data points. The tabular data model is trained to predict class membership of a given data point with a set of context data points. Rather than use the predicted class directly, the class predictions are used to determine a class-conditional energy for a synthetic data point with respect to the target class. The synthetic data point may then be updated based on the class-conditional energy with a stochastic update algorithm, such as stochastic gradient Langevin dynamics or Adaptive Moment Estimation with noise. The value of the synthetic data point is sampled as a data point for the target class. This permits effective data augmentation for tabular data for downstream models.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for generating synthetic data points, comprising: a processor configured to execute instructions; a computer-readable medium having instructions executable by the processor for: identifying a synthetic data point of tabular data; updating the synthetic data point with respect to a set of context data points by: determining a class-conditional energy of a target class for the synthetic data point applied to a pre-trained tabular classification model with respect to the set of context data points; stochastically updating the synthetic data point based on the class-conditional energy of the target class; and sampling the synthetic data point as a generated data point for the target class. 2 . The system of claim 1 , wherein the pre-trained tabular classification model is not trained on the set of context data points. 3 . The system of claim 1 , wherein identifying the synthetic data point comprises sampling from a distribution based a subset of the context data points having the target class. 4 . The system of claim 1 , wherein the set of context data points include a first subset of context data points associated with the target class and a second subset of context data points associated with at least one other class differing from the target class. 5 . The system of claim 1 , wherein the instructions are further executable for: training an application computer model with training data that includes the generated data point and one or more data points from the set of context data points. 6 . The system of claim 1 , wherein the class-conditional energy includes a term based on the energy of the set of context data points given the respective class of the context data points. 7 . The system of claim 1 , wherein stochastically updating the synthetic data point based on the class-conditional energy of the target class comprises applying stochastic gradient Langevin dynamics. 8 . The system of claim 1 , wherein stochastically updating the synthetic data point based on the class-conditional energy of the target class comprises applying Adaptive Moment Estimation (Adam) with noise. 9 . A method for generating synthetic, the method comprising: identifying a synthetic data point of tabular data; updating the synthetic data point with respect to a set of context data points by: determining a class-conditional energy of a target class for the synthetic data point applied to a pre-trained tabular classification model with respect to the set of context data points; stochastically updating the synthetic data point based on the class-conditional energy of the target class; and sampling the synthetic data point as a generated data point for the target class. 10 . The method of claim 9 , wherein the pre-trained tabular classification model is not trained on the set of context data points. 11 . The method of claim 9 , wherein identifying the synthetic data point comprises sampling from a distribution based a subset of the context data points having the target class. 12 . The method of claim 9 , wherein the set of context data points include a first subset of context data points associated with the target class and a second subset of context data points associated with at least one other class differing from the target class. 13 . The method of claim 9 , wherein the method further comprises: training an application computer model with training data that includes the generated data point and one or more data points from the set of context data points. 14 . The method of claim 9 , wherein the class-conditional energy includes a term based on the energy of the set of context data points given the respective class of the context data points. 15 . The method of claim 9 , wherein stochastically updating the synthetic data point based on the class-conditional energy of the target class comprises applying stochastic gradient Langevin dynamics. 16 . The method of claim 9 , wherein stochastically updating the synthetic data point based on the class-conditional energy of the target class comprises applying Adaptive Moment Estimation (Adam) with noise. 17 . A non-transitory computer-readable medium, the non-transitory computer-readable medium comprising instructions executable by a processor for: identifying a synthetic data point of tabular data; updating the synthetic data point with respect to a set of context data points by: determining a class-conditional energy of a target class for the synthetic data point applied to a pre-trained tabular classification model with respect to the set of context data points; stochastically updating the synthetic data point based on the class-conditional energy of the target class; and sampling the synthetic data point as a generated data point for the target class. 18 . The computer-readable medium of claim 17 , wherein the instructions are further executable for: training an application computer model with training data that includes the generated data point and one or more data points from the set of context data points. 19 . The computer-readable medium of claim 17 , wherein stochastically updating the synthetic data point based on the class-conditional energy of the target class comprises applying stochastic gradient Langevin dynamics. 20 . The computer-readable medium of claim 17 , wherein stochastically updating the synthetic data point based on the class-conditional energy of the target class comprises applying Adaptive Moment Estimation (Adam) with noise.

Assignees

Inventors

Classifications

  • G06F40/177Primary

    of tables; using ruled lines · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025124220A1 cover?
A tabular data model, which may be pre-trained on a different data set, is used to generate data samples for a target class with a given set of context data points. The tabular data model is trained to predict class membership of a given data point with a set of context data points. Rather than use the predicted class directly, the class predictions are used to determine a class-conditional ene…
Who is the assignee on this patent?
Toronto Dominion Bank
What technology area does this patent fall under?
Primary CPC classification G06F40/177. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).