Layered cybersecurity using spurious data samples

US12395529B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12395529-B2
Application numberUS-202318170492-A
CountryUS
Kind codeB2
Filing dateFeb 16, 2023
Priority dateFeb 16, 2023
Publication dateAug 19, 2025
Grant dateAug 19, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some aspects, a computing system may iterate between adding spurious data to the dataset and training a model on the dataset. If the model's performance has not dropped by more than a threshold amount, then additional spurious data may be added to the dataset until the desired amount of performance decrease has been achieved. the computing system may determine the amount of impact each feature has on a model's output. The computing system may generate a spurious data sample by modifying values of features that are more impactful than other features. The computing system may repeatedly modify the spurious data that is stored in a dataset. If a cybersecurity incident occurs (e.g., the dataset is stolen or leaked), the system may identify when the cybersecurity incident took place based on the spurious data that is stored in the dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for providing an additional layer of data security to prevent malicious actors from using data by modifying a dataset to include spurious data, the system comprising: one or more processors; and a non-transitory, computer readable medium having instructions recorded thereon that, when executed by the one or more processors, cause operations comprising: identifying a dataset, comprising original data samples, to which spurious data samples are to be added to decrease effectiveness of the dataset; generating, based on the original data samples, first spurious data samples for the dataset; based on a key indicating locations within the dataset where spurious data is to be stored, storing a modified dataset in place of the dataset by adding the first spurious data samples to the dataset at the locations within the dataset; based on a performance metric of a machine learning model trained on the modified dataset satisfying a performance threshold, generating second spurious data samples, and adding the second spurious data samples to the modified dataset to supplement or replace the first spurious data samples in the modified dataset to decrease the performance metric of the machine learning model; and in response to receiving a request to use the dataset and determining that the request is an authorized request, transmitting, as a response to the request in lieu of transmitting the modified dataset, the dataset without the first spurious data samples and without the second spurious data samples by performing removal of spurious data samples from the modified dataset in connection with the request to use the dataset. 2. A method comprising: identifying a first dataset comprising a set of original data samples; generating, based on the set of original data samples, a first set of spurious data samples for the first dataset; based on a key indicating one or more locations within the first dataset where spurious data is to be stored, storing a modified dataset by adding the first set of spurious data samples to the first dataset at the one or more locations within the first dataset; determining that a performance metric of a machine learning model trained on the modified dataset would satisfy a performance threshold; based on determining that the performance metric of the machine learning model would satisfy the performance threshold, adding a second set of spurious data samples to the modified dataset to supplement or replace one or more samples of the first set of spurious data samples in the modified dataset; and in connection with receiving a request to use the first dataset, determining that the request is not associated with a malicious computing device, and transmitting, as a response to the request in lieu of transmitting the modified dataset, the first dataset without the first set of spurious data samples and without the second set of spurious data samples. 3. The method of claim 2 , wherein the performance metric comprises accuracy, logarithmic loss, F1 score, precision, recall, or mean squared error. 4. The method of claim 2 , wherein generating the first set of spurious data samples comprises: generating, based on a first data sample of the set of original data samples, an explanation indicating a feature that is more influential than other features of the first data sample for output generated by the machine learning model, the output corresponding to the first data sample; and generating a spurious data sample of the first set of spurious data samples by: generating a copy of the first data sample; and modifying a value of the copy of the first data sample, the value corresponding to the feature. 5. The method of claim 2 , wherein the first set of spurious data samples, when used to train the machine learning model, cause the machine learning model to generate incorrect output for more than a threshold number of data samples of the set of original data samples. 6. The method of claim 2 , wherein generating the first set of spurious data samples comprises: determining a modification to a value of a first data sample of the set of original data samples, wherein the modification causes the machine learning model to output an incorrect class of the first data sample; and generating a spurious data sample comprising a label corresponding to the first data sample and a result of the modification. 7. The method of claim 2 , wherein transmitting the first dataset comprises: determining a computing device that has experienced more than a threshold amount of cyber security attacks within a time period; and based on the computing device having experienced more than the threshold amount of cyber security attacks within the time period, removing spurious data samples from the modified dataset to reproduce the first dataset, without the first set of spurious data samples and without the second set of spurious data samples, after the computing device has completed preprocessing the modified dataset. 8. The method of claim 2 , wherein transmitting the first dataset comprises: determining that the first dataset is to be used for machine learning model training; and based on determining that the first dataset is to be used for machine learning model training, removing spurious data samples from the modified dataset to reproduce the first dataset without the first set of spurious data samples and without the second set of spurious data samples. 9. The method of claim 2 , wherein adding the second set of spurious data samples comprises: comparing output of a first machine learning model with output of a second machine learning model; and based on the output of the first machine learning model satisfying a similarity threshold to the output of the second machine learning model, adding the second set of spurious data samples to the modified dataset to supplement or replace one or more samples of the first set of spurious data samples in the modified dataset. 10. The method of claim 2 , further comprising: determining that the performance metric of the machine learning model trained on the modified dataset with the second set of spurious data samples would fail to satisfy the performance threshold, wherein a third set of spurious data samples is not added to the modified dataset based on determining that the performance metric of the machine learning model trained on the modified dataset with the second set of spurious data samples would fail to satisfy the performance threshold. 11. The method of claim 2 , wherein the key indicates a plurality of rows within the first dataset where spurious data samples is to be placed. 12. One or more non-transitory computer-readable media comprising instructions that, when executed by one or more processors, cause operations comprising: identifying a first dataset comprising a set of original data samples; generating, based on the set of original data samples, a first set of spurious data samples for the first dataset; based on a key indicating one or more locations within the first dataset where spurious data is to be stored, storing a modified dataset by adding the first set of spurious data samples to the first dataset; based on determining that a performance metric of a machine learning model trained on the modified dataset would satisfy a performance threshold, adding a second set of spurious data samples to the modified dataset to supplement or replace one or more samples of the first set of spurious data samples in the modified dataset; and based on determining that a request to use the first dataset is not associated with a malicious computing device, transmitting, as a response

Assignees

Inventors

Classifications

  • Event detection, e.g. attack signature detection · CPC title

  • using machine learning or artificial intelligence · CPC title

  • using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12395529B2 cover?
In some aspects, a computing system may iterate between adding spurious data to the dataset and training a model on the dataset. If the model's performance has not dropped by more than a threshold amount, then additional spurious data may be added to the dataset until the desired amount of performance decrease has been achieved. the computing system may determine the amount of impact each featu…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 19 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).