Systems and methods for a data ecosystem

US12373468B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12373468-B2
Application numberUS-202318517088-A
CountryUS
Kind codeB2
Filing dateNov 22, 2023
Priority dateNov 22, 2023
Publication dateJul 29, 2025
Grant dateJul 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for creating a governance graph representing data set interconnection. The data set interconnections may be based on common fields, sources, databases, applications, or patterns of usage. For example, the interconnections may be direct connections, where one data set is directly downstream from another data set. Alternatively, the interconnections may be indirect connections based patterns showing the data sets are commonly used together. For example, given data sets “A”, “B”, and “C”, if “B” is directly connected to “A” because it is downstream from “A”, and a particular group of users commonly use “B” and “C” together, “A” may be indirectly related to “C” based on the pattern of usage. In this example, the governance graph is configured to indicate the connection between “A” and “B” is stronger than the connection between “A” and “C”, whilst still showing said connection.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for creating a representation of interconnections between data sets, the method comprising the steps of: training, using training data, a machine learning model to classify data entries in a dataset, the training including: iteratively predicting data entry classification based on one or more traits of the data entry; testing and comparing, during each iteration, the data entry classification to a target variable; indicating, for each iteration and via a feedback loop, modifications to weights assigned to nodes of the machine learning model to improve predictability of the target variable and reduce error of the machine learning model; deploying the trained machine learning model and storing the machine learning model to one or more storage locations to be included in a plurality of trained machine learning models for data classification; receiving, via a network as part of a data classification process, a plurality of data sets from a plurality of sources, the data sets including a plurality of traits; storing the plurality of data sets into a data catalog to be classified according to the data classification process; performing a first instance of the data classification process on a first data set and a second data set of the plurality of data sets, the first instance including determining, using at least one semantic type selected from one or more stored semantic types, and via at least one machine learning model of the plurality of trained machine learning models, at least one common trait for the first data set and the second data set, the first instance including examining, using the at least one semantic type, each column of the first data set and the second data set to identify common data in data fields of the first data set and the second data set and assigning metadata to the first data set and the second data set, the metadata including the at least one semantic type, the at least one machine learning model including the machine learning model trained to classify the data entries in the dataset; generating a first visual representation of a first interconnection between the first data set and the second data set based on the at least one common trait, wherein the representation of the first interconnection comprises a first value; performing a second instance of the data classification process on the second data set and a third data set from the plurality of data sets, the second instance including determining, using a semantic type selected from the one or more stored semantic types, and via one or more machine learning models of the plurality of trained machine learning models, at least one common trait for the second data set and the third data set, the second instance including examining, using the semantic type, each column of the second data set and the third data set to identify common data in data fields of the second data set and the third data set; generating a second visual representation of a second interconnection between the second data set and the third data set based on the at least one common trait of the second data set and the third data set, wherein the representation of the second interconnection comprises a second value; and initiating display, via a graphical user interface, of a governance graph comprising the first interconnection and the second interconnection. 2. The computer-implemented method of claim 1 , wherein at least one of the first interconnection and the second interconnection comprises at least one of an interconnection between data policies, data procedures, and data usage patterns. 3. The computer-implemented method of claim 2 , wherein the first interconnection is based on data usage patterns. 4. The computer-implemented method of claim 1 , wherein the method further comprises generating, and initiating display via the graphical user interface, a recommendation of one or more additional data sets having at least one common trait with at least one of the first data set, the second data set, and the third data set. 5. The computer-implemented method of claim 1 , wherein the first value represents a stronger connection than the second value. 6. The computer-implemented method of claim 1 , wherein the first value is indicated by a thick line on the displayed governance graph and the second value is indicated by a thin line on the displayed governance graph. 7. The computer-implemented method of claim 6 , wherein the first value is indicated by a short line on the displayed governance graph and the second value is indicated by a long line on the displayed governance graph. 8. The computer-implemented method of claim 1 , wherein the method further comprises: determining at least one common trait for the first data set and the third data set from the plurality of data sets; generating a representation of a third interconnection between the first data set and the third data set based on the at least one common trait, wherein the representation of the third interconnection comprises a third value; and initiating display, via the graphical user interface, the governance graph comprising the first interconnection, the second interconnection, and the third interconnection. 9. The computer-implemented method of claim 1 , wherein the at least one common trait is at least one of a common field, a common usage, a common source, a common database, a common generating application, and a common pattern of usage. 10. A computer system for creating a representation of associations between data sets, the computer system comprising: at least one processor; a communication interface communicatively coupled to the at least one processor; and a memory device storing executable code that, when executed, causes the processor to: train, using training data, a machine learning model to classify data entries in a dataset, the training including: iteratively predicting data entry classification based on one or more traits of the data entry; testing and comparing, during each iteration, the data entry classification to a target variable; indicating, for each iteration and via a feedback loop, modifications to weights assigned to nodes of the machine learning model to improve predictability of the target variable and reduce error of the machine learning model; deploy the trained machine learning model and storing the machine learning model to one or more storage locations to be included in a plurality of trained machine learning models for data classification; receive, via a network as part of a data classification process, a plurality of data sets from a plurality of sources, the data sets each including one or more characteristics; store the plurality of data sets into a data catalog to be classified according to the data classification process; perform a first instance of the data classification process on a first data set and a second data set of the plurality of data sets, the first instance including determining, using at least one semantic type selected from one or more stored semantic types, and via at least one machine learning model of the plurality of trained machine learning models, at least one common characteristic for the first data set and the second data set from the plurality of data sets, the first instance including examining, using the at least one semantic type, each column of the first data set and the second data set to identify common data in data fields of the first data set and the second data set and assigning metadata to the first data set and the second data set, the metadata including the at least one semantic type, the at least one machine learning model including the machine learning model tra

Assignees

Inventors

Classifications

  • Clustering; Classification · CPC title

  • G06F16/287Primary

    Visualization; Browsing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12373468B2 cover?
Systems and methods are disclosed for creating a governance graph representing data set interconnection. The data set interconnections may be based on common fields, sources, databases, applications, or patterns of usage. For example, the interconnections may be direct connections, where one data set is directly downstream from another data set. Alternatively, the interconnections may be indire…
Who is the assignee on this patent?
Truist Bank
What technology area does this patent fall under?
Primary CPC classification G06F16/287. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).