Custom semantic search experience driven by an ontology
US-2022035866-A1 · Feb 3, 2022 · US
US12450240B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12450240-B2 |
| Application number | US-202318524597-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 30, 2023 |
| Priority date | Nov 30, 2023 |
| Publication date | Oct 21, 2025 |
| Grant date | Oct 21, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods perform data analysis on at least two separate datasets to identify any redundancies. The data analysis includes comparing first data values of a first dataset with second data values of a second dataset, the comparing including evaluating similarities of the first data values and second data values, identifying, from the comparing, that the first data values and the second data values include at least a portion of substantially similar data, and interpreting similarities of the portion of substantially similar data, the interpreting including determining that a dataset of the first dataset and second dataset is a subset of the other dataset. Further, control signal(s) are transmitted to a user device to initiate displaying, via a user interface of the user device, a prompt indicating that the at least one dataset is likely the subset of the other dataset.
Opening claim text (preview).
What is claimed is: 1. A computing system for database management, comprising: at least one processor; a communication interface communicatively coupled to the at least one processor; and a memory device storing executable code that, when executed, causes the at least one processor to: iteratively train, using training data, a neural network incorporating a machine learning program to detect redundancies in datasets and subsets of datasets, the training including: inserting the training data into an iterative training and testing loop to predict a target variable; repeatedly predicting the target variable during each iteration of the training and testing loop, wherein each iteration of the training and testing loop has differing weights applied to one or more nodes of the neural network, each of the differing weights being updated with each iteration of the training and testing loop to reduce error in predicting the target variable, which improves predictability of the target variable and functionality of the network; deploy the trained neural network; perform, via the trained and deployed neural network, data analysis on at least two separate datasets to predict whether any redundancies are present in the at least two separate datasets, the data analysis including: comparing first data values of a first dataset of the at least two separate datasets with second data values of a second dataset of the at least two separate datasets, the comparing including evaluating similarities of the first data values and the second data values; interpreting, using the trained and deployed neural network and from the comparing, that the first data values and the second data values comprise at least a portion of substantially similar data due to identified redundancies despite the first data values utilize names to represent users and the second data values utilize numeric identifiers to represent the users, the interpreting analyzing similarities of the portion of substantially similar data; determining that at least one dataset of the first dataset and the second dataset is a subset of an other dataset of the first dataset and the second dataset; identifying a percentage of similarity between the at least one dataset and the other dataset; calculating a checksum of the at least two separate datasets to determine how the at least two separate datasets have changed over time; and determining whether the at least one dataset comprises masked or tokenized data and determining that the at least one dataset comprises personally identifiable information that has not been tokenized or masked in the other dataset; and transmit, to a user device, one or more control signals to initiate displaying, via a user interface of the user device, one or more prompts indicating: that the at least one dataset is likely the subset of the other dataset; the percentage of similarity; one or more changes identified from determining how the at least two separate datasets have changed over time; which of the one dataset and the other dataset has been most recently modified; and a control input for tokenizing or masking the personally identifiable information of the other dataset; and receive, via the user interface of the user device, from a user, one or more input responses to the one or more prompts. 2. The computing system of claim 1 , wherein the executable code, when executed, further causes the at least one processor to: receive, from the user device, one or more inputs indicating a user desires for the at least one dataset is to be consolidated with the other dataset; and consolidate, based on the one or more inputs, the at least one dataset with the other dataset thereby saving space at one or more data storage locations that store the at least two separate datasets. 3. The computing system of claim 2 , wherein the consolidating comprises deleting the at least one dataset. 4. The computing system of claim 1 , wherein the comparing further comprises: ascertaining a first maximum value of the first data values, a first minimum value of the first data values, a first mean value of the first data values, and a first range of values of the first data values, a first standard deviation of the first data values, a second maximum value of the second data values, a second minimum value of the second data values, a second mean value of the second data values, a second range of values of the second data values, and a second standard deviation of the second data values; and relating the first maximum value to the second maximum value, the first minimum value to the second minimum value, the first mean value to the second mean value, the first range to the second range, and the first standard deviation to the second standard deviation. 5. The computing system of claim 1 , wherein the comparing further comprises incorporating natural language processing to interpret meaning of words included in the at least two separate datasets, and based thereon determine whether the meaning of the words is the same. 6. The computing system of claim 1 , wherein the data analysis further comprises identifying a percentage of similarity between the at least one dataset and the other dataset and the transmitting is based on comparing the percentage of similarity to a predetermined threshold and the percentage of similarity being at least equal to the predetermined threshold. 7. A computing system facilitating data redundancy detection, the computing system comprising: at least one processor; a communication interface communicatively coupled to the at least one processor; and a memory device storing executable code that, when executed, causes the at least one processor to: iteratively train, using training data, a neural network incorporating a machine learning program to perform natural language processing to detect redundancies in datasets and subsets of datasets, the training including: inserting the training data into an iterative training and testing loop to predict a target variable; repeatedly predicting the target variable during each iteration of the training and testing loop, wherein each iteration of the training and testing loop has differing weights applied to one or more nodes of the neural network, each of the differing weights being updated with each iteration of the training and testing loop to reduce error in predicting the target variable, which improves predictability of the target variable and functionality of the network; deploy the trained neural network; perform data analysis on a first dataset and a second dataset to predict whether any redundancies are present in the at least two separate datasets, the data analysis including: performing, using the trained and deployed neural network, natural language processing on first words included in the first dataset and second words included in the second dataset to derive that names of users represented by the first words are last names that correspond to the second words that include first names of the users; determining, based on the deriving, that at least a portion of the first dataset and the second dataset are likely redundant, the determining including semantically comparing the meaning of the first words and the second words to interpret the meaning to be similar based on satisfying a similarity threshold; identifying a percentage of similarity between at least the portion of the first dataset and the second dataset; calculating a checksum of the at least two separate datasets to determine how the at least two separate datasets have changed over time; and determining whether at least the portion of the first dataset comprises masked or tokenized data and determining that at least the portion of the first dataset comprises p
Presentation of query results · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
using context · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.