Resolving dataset corruption of transferred datasets using programming language-agnostic data modeling platforms

US12210501B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12210501-B1
Application numberUS-202418892312-A
CountryUS
Kind codeB1
Filing dateSep 20, 2024
Priority dateJun 22, 2023
Publication dateJan 28, 2025
Grant dateJan 28, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for a programming language-agnostic data modeling platform that is both less resource intensive and scalable. Additionally, the programming language-agnostic data modeling platform allows for advanced analytics to be run on descriptions of the known logical data models, to generate data offerings describing underlying data, and to easily format data for compatibility with artificial intelligence systems. The systems and methods use a supplemental data structure that comprises logical data modeling metadata, in which the logical data modeling metadata describes the logical data model in a common, standardized language. For example, the logical data modeling metadata may comprise a transformer lineage of the logical data model.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for resolving corrupted datasets transferred to data repositories having differing physical data models using programming language-agnostic data modeling platforms, the system comprising: one or more processors; and a non-transitory, computer-readable medium comprising instructions recorded thereon that, when executed by the one or more processors, cause operations comprising: receiving a data transfer request to perform a transfer of a dataset from a first local data repository to a second local data repository, wherein the first local data repository is associated with a first physical data model of a first entity, and wherein the second local data repository is associated with a second physical data model of a second entity that is different from the first physical data model; in response to receiving the data transfer request, identifying, based on a dataset description of the dataset, a first logical data model to be used in connection with performing the transfer of the dataset from the first local data repository to the second local data repository; determining a first supplemental data structure for the identified first logical data model, wherein the first supplemental data structure is expressed in a standardized language and comprises a first attribute; generating a first mapping, based on the identified first logical data model and the second physical data model of the second local data repository, for performing the transfer of the dataset from the first local data repository to the second local data repository, wherein the first mapping maps the first attribute of the first supplemental data structure to the second physical data model of the second local data repository; performing the transfer of the dataset from the first local repository to the second local repository based on the first mapping; in connection with performing the transfer of the dataset, receiving a data transfer error message from the second entity that is associated with the second local data repository indicating (i) an identified transferred dataset error that occurred during the transfer of the dataset and (ii) the data transfer request; and in response to receiving the data transfer error message, transmitting executable code to the second entity that is associated with the second local data repository, wherein the executable code corresponds to a data analytic operation to be performed on the transferred dataset to resolve the identified transferred dataset error. 2. A method for resolving corrupted datasets using programming language-agnostic data modeling platforms, the method comprising: receiving a request to perform a first data operation on a first dataset from a first data source of a first entity, wherein the first data operation (i) uses a logical data model to perform the first data operation on the first dataset and (ii) involves a physical data model of a second entity; in response to receiving the request, identifying, based on a first dataset description of the first dataset, a first logical data model to be used in connection with performing the first data operation on the first dataset; determining a first supplemental data structure for the identified first logical data model, wherein the first supplemental data structure is expressed in a standardized language and comprises a first attribute; generating a first mapping, based on the identified first logical data model, for performing the first data operation on the first dataset, wherein the first mapping maps the first attribute of the first supplemental data structure to the physical data model of the second entity; in response to performing the first data operation on the first dataset that is based on the first mapping, receiving a first data operation error message associated with the second entity that indicates an identified error that occurred during a performance of the first data operation on the first dataset, wherein the first data operation is performed; and transmitting, to the second entity, executable code corresponding to a second data operation to be performed on the first dataset to resolve the identified error. 3. The method of claim 2 , wherein identifying the first logical data model further comprises: providing the first dataset description of the first dataset as input to a first artificial intelligence model trained to identify logical data models to perform data operations on datasets; receiving, from the first artificial intelligence model, a ranked set of logical data models, wherein each ranked logical data model of the ranked set of logical data models are ranked based on a confidence value indicating a likelihood that the first dataset uses a respective ranked logical data model of the ranked set of logical data models; and identifying the first logical data model based on a selection of a respective logical data model that satisfies a threshold confidence value from the set of ranked logical data models. 4. The method of claim 3 , wherein the first artificial intelligence model comprises an Large Language Model (LLM), and wherein the LLM is trained, the training of the LLM comprising: obtaining a set of training datasets and a set of training logical data model descriptions, wherein each training dataset of the set of training datasets corresponds to a training logical data model description of the set of training logical data model descriptions, and wherein each training logical data model description of the set of training logical data model descriptions is associated with a metadata schema; providing the set of training datasets and the set of training logical data model descriptions to the LLM during a training routine, the LLM being communicatively coupled to a retrieval component configured to retrieve (i) similar logical data models historically used in connection with a dataset and (ii) metadata schemas associated with the respective similar logical data models to be provided to the LLM; receiving, from the LLM during the training routine, a set of candidate logical data models and corresponding metadata schemas based on (i) the similar logical data models historically used in connection with a dataset and (ii) the metadata schemas associated with the respective similar logical data models; and in response to receiving the set of candidate logical data models and the corresponding metadata schemas, providing a message, during the training routine, to the LLM comprising an accuracy value corresponding to each candidate logical data model and corresponding metadata schemas. 5. The method of claim 2 , further comprising: extracting, from the request to perform the first data operation on the first dataset from the first data source of the first entity, an identifier associated with the first dataset; obtaining, based on the extracted identifier, the first dataset from a data repository storing datasets; and determining the first dataset description of the first dataset based on metadata associated with the first dataset. 6. The method of claim 2 , wherein determining the first supplemental data structure for the identified first logical data model further comprises: providing an identifier associated with the first logical data model as input to a second artificial intelligence model configured to determine supplemental data structures for logical data models; and receiving, from the second artificial intelligence model, the first supplemental data structure for the identified first logical data model, wherein the first supplemental data structure is expressed in the standardized language and comprises the first attribute, and wherein the first attribute comprises a first transformer lineage of the first logical data model. 7. T

Assignees

Inventors

Classifications

  • G06F16/22Primary

    Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12210501B1 cover?
Systems and methods for a programming language-agnostic data modeling platform that is both less resource intensive and scalable. Additionally, the programming language-agnostic data modeling platform allows for advanced analytics to be run on descriptions of the known logical data models, to generate data offerings describing underlying data, and to easily format data for compatibility with ar…
Who is the assignee on this patent?
Citibank Na
What technology area does this patent fall under?
Primary CPC classification G06F16/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 28 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).