System for automated process substitution with connection-preserving capabilities
US-2024406173-A1 · Dec 5, 2024 · US
US2024283822A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024283822-A1 |
| Application number | US-202318170492-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 16, 2023 |
| Priority date | Feb 16, 2023 |
| Publication date | Aug 22, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In some aspects, a computing system may iterate between adding spurious data to the dataset and training a model on the dataset. If the model's performance has not dropped by more than a threshold amount, then additional spurious data may be added to the dataset until the desired amount of performance decrease has been achieved. the computing system may determine the amount of impact each feature has on a model's output. The computing system may generate a spurious data sample by modifying values of features that are more impactful than other features. The computing system may repeatedly modify the spurious data that is stored in a dataset. If a cybersecurity incident occurs (e.g., the dataset is stolen or leaked), the system may identify when the cybersecurity incident took place based on the spurious data that is stored in the dataset.
Opening claim text (preview).
What is claimed is: 1 . A system for providing an additional layer of data security to prevent malicious actors from using data by modifying a dataset to include spurious data, the system comprising: one or more processors; and a non-transitory, computer readable medium having instructions recorded thereon that, when executed by the one or more processors, cause operations comprising: obtaining a first dataset comprising a set of original data samples, wherein each data sample comprises a label indicating a correct classification; generating a key that indicates a location within the first dataset where spurious data should be stored; generating, based on the set of original data samples, a first set of spurious data samples for the first dataset, wherein the first set of spurious data samples, when used to train a first machine learning model, cause the first machine learning model to generate incorrect output for more than a threshold number of data samples of the set of original data samples; based on the key, adding the first set of spurious data samples to the first dataset; training a machine learning model based on the first dataset; based on a performance metric of the machine learning model satisfying a threshold, generating a second set of spurious data samples; adding the second set of spurious data samples to the first dataset; and based on determining a request to use the first dataset is not associated with a malicious computing device, removing the first set of spurious data samples and the second set of spurious data from the first dataset. 2 . A method comprising: obtaining a first dataset comprising a set of original data samples; generating a key that indicates a location within the first dataset where spurious data should be stored; generating, based on the set of original data samples, a first set of spurious data samples for the first dataset; based on the key, adding the first set of spurious data samples to the first dataset; based on determining that the first set of spurious data samples fails to modify performance of a machine learning model, adding a second set of spurious data samples to the first dataset; and based on determining a request to use the first dataset is not associated with a malicious computing device, removing the first set of spurious data samples and the second set of spurious data samples from the first dataset. 3 . The method of claim 2 , wherein adding the second set of spurious data samples to the first dataset comprises: training a machine learning model based on the first dataset; based on a performance metric of the machine learning model satisfying a threshold, generating a second set of spurious data, wherein the performance metric comprises accuracy, logarithmic loss, F1 score, precision, recall, or mean squared error; and based on the performance metric of the machine learning model satisfying the threshold, adding the second set of spurious data to the first dataset. 4 . The method of claim 2 , wherein generating the first set of spurious data samples comprises: generating, based on a first data sample of the set of original data samples, an explanation indicating a feature that is more influential than other features of the first data sample for output generated by the machine learning model, the output corresponding to the first data sample; and generating a spurious data sample of the first set of spurious data samples by: generating a copy of the first data sample; and modifying a value of the copy of the first data sample, the value corresponding to the feature. 5 . The method of claim 2 , wherein the first set of spurious data samples, when used to train the machine learning model, cause the machine learning model to generate incorrect output for more than a threshold number of data samples of the set of original data samples. 6 . The method of claim 2 , wherein generating the first set of spurious data samples comprises: determining a modification to a value of a first data sample of the set of original data samples, wherein the modification causes the machine learning model to output an incorrect class of the first data sample; and generating a spurious data sample comprising a label corresponding to the first data sample and a result of the modification. 7 . The method of claim 2 , wherein removing the first set of spurious data samples from the first dataset comprises: determining a computing device that has experienced more than a threshold amount of cyber security attacks within a time period; and based on the computing device having experienced more than the threshold amount of cyber security attacks within the time period, removing the first set of spurious data samples from the first dataset after the computing device has completed preprocessing the first dataset. 8 . The method of claim 2 , wherein removing the first set of spurious data samples from the first dataset comprises: determining that the first dataset is to be used to train a machine learning model; and based on determining that the first dataset is to be used to train the machine learning model, removing the first set of spurious data samples from the first dataset. 9 . The method of claim 2 , wherein generating the second set of spurious data samples comprises: comparing output of a first machine learning model with output of a second machine learning model; and based on the output of the first machine learning model satisfying a similarity threshold to the output of the second machine learning model, generating the second set of spurious data samples. 10 . The method of claim 2 , further comprising steps for generating the first set of spurious data samples. 11 . The method of claim 2 , wherein the key indicates a plurality of rows within the first dataset where spurious data samples should be placed. 12 . A non-transitory, computer-readable medium comprising instructions that when executed by one or more processors, cause operations comprising: obtaining a first dataset comprising a set of original data samples; generating a key that indicates a location within the first dataset where spurious data should be stored; generating, based on the set of original data samples, a first set of spurious data samples for the first dataset; based on the key, adding the first set of spurious data samples to the first dataset; and based on determining a request to use the first dataset is not associated with a malicious computing device, removing the first set of spurious data samples from the first dataset. 13 . The medium of claim 12 , wherein adding the first set of spurious data samples to the first dataset comprises: training a machine learning model based on the first dataset; based on a performance metric of the machine learning model satisfying a threshold, generating a second set of spurious data, wherein the performance metric comprises accuracy, logarithmic loss, F1 score, precision, recall, or mean squared error; and based on the performance metric of the machine learning model satisfying the threshold, adding the first set of spurious data samples to the first dataset. 14 . The medium of claim 12 , wherein generating the first set of spurious data samples comprises: generating, based on a first data sample of the set of original data samples, an explanation indicating a feature that is more influential than other features of the first data sample for output generated by a machine learning model, the output corresponding to the first data sample; and generating a spurious data sample of the first set of spurious d
using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment · CPC title
using machine learning or artificial intelligence · CPC title
Event detection, e.g. attack signature detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.