Systems and methods for generating contextual table embeddings for tabular data
US-2024242024-A1 · Jul 18, 2024 · US
US2025363123A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025363123-A1 |
| Application number | US-202519209875-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 16, 2025 |
| Priority date | May 23, 2024 |
| Publication date | Nov 27, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Context-based tabular data models use a context to evaluate a queried data point. Rather than a randomized or full context of domain data points, a local context of data points is selected that is customized for a particular data query. The system uses a pre-trained model, such as a TabPFN, that is trained on a classification for different types of data sets along with a “context” for applying the model with the nearest neighbors of that data point. The number of neighbors may vary and may be determined based on the distance of data points to the query point. The system also optimizes fine-tuning of tabular data models with neighborhood data so that local context can be used to select training batches of data using a common context. This allows local context fine-tuning without excess training costs of single-item training batches.
Opening claim text (preview).
What is claimed is: 1 . A computing system for training a tabular data model with localized context, comprising: one or more processors configured to execute instructions; and a non-transitory computer-readable storage medium containing instructions executable by the one or more processors for: selecting a training data point from a set of training data points for a domain of tabular data; identifying a subset of data points in the set of training data that form a neighborhood around the training data point; selecting a context and a plurality of query points from the subset of data points that form the neighborhood around the training data point; and training parameters of a tabular data model with a training batch including the context and the plurality of query points. 2 . The computing system of claim 1 , wherein identifying the subset of data points comprises selecting nearest-neighbors of the identified training data point as the neighborhood. 3 . The computing system of claim 1 , wherein a number of the subset of data points varies based on the distance of data points to the training data point. 4 . The computing system of claim 1 , wherein the tabular data model is a transformer model. 5 . The computing system of claim 1 , wherein training parameters of the tabular data model comprises masking attention between the plurality of query points during application of the tabular data model. 6 . The computing system of claim 1 , wherein selecting the context and the plurality of query points comprises randomly assigning the subset of data points to the context or the plurality of query points. 7 . A method for training a tabular data model with localized content, comprising: selecting a training data point from a set of training data points for a domain of tabular data; identifying a subset of data points in the set of training data that form a neighborhood around the training data point; selecting a context and a plurality of query points from the subset of data points that form the neighborhood around the training data point; and training parameters of a tabular data model with a training batch including the context and the plurality of query points. 8 . The method of claim 7 , wherein identifying the subset of data points comprises selecting nearest-neighbors of the identified training data point as the neighborhood. 9 . The method of claim 7 , wherein a number of the subset of data points varies based on the distance of data points to the training data point. 10 . The method of claim 7 , wherein the tabular data model is a transformer model. 11 . The method of claim 7 , wherein training parameters of the tabular data model comprises masking attention between the plurality of query points during application of the tabular data model. 12 . The method of claim 7 , wherein selecting the context and the plurality of query points comprises randomly assigning the subset of data points to the context or the plurality of query points. 13 . A non-transitory computer-readable medium for training a tabular data model with localized content, the non-transitory computer-readable medium comprising instructions executable by a processor for: selecting a training data point from a set of training data points for a domain of tabular data; identifying a subset of data points in the set of training data that form a neighborhood around the training data point; selecting a context and a plurality of query points from the subset of data points that form the neighborhood around the training data point; and training parameters of a tabular data model with a training batch including the context and the plurality of query points. 14 . The non-transitory computer-readable medium of claim 13 , wherein identifying the subset of data points comprises selecting nearest-neighbors of the identified training data point as the neighborhood. 15 . The non-transitory computer-readable medium of claim 13 , wherein a number of the subset of data points varies based on the distance of data points to the training data point. 16 . The non-transitory computer-readable medium of claim 13 , wherein the tabular data model is a transformer model. 17 . The non-transitory computer-readable medium of claim 13 , wherein training parameters of the tabular data model comprises masking attention between the plurality of query points during application of the tabular data model. 18 . The non-transitory computer-readable medium of claim 13 , wherein selecting the context and the plurality of query points comprises randomly assigning the subset of data points to the context or the plurality of query points.
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
using context · CPC title
Clustering or classification · CPC title
Approximate or statistical queries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.