Systems and methods for intelligent phishing threat detection and phishing threat remediation in a cyber security threat detection and mitigation platform
US-2024414198-A1 · Dec 12, 2024 · US
US2024281523A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024281523-A1 |
| Application number | US-202318170495-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 16, 2023 |
| Priority date | Feb 16, 2023 |
| Publication date | Aug 22, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In some aspects, a computing system obtain a first dataset including a set of original data samples. The computing system may generate a key that indicates a location within the first dataset where spurious data should be stored. The computing system may determine a modified value associated with a first data sample of the set of original data samples, where the modified value causes a machine learning model to generate output that does not match a label associated with the first data sample. Based on the first data sample, the computer system may generate a spurious data sample comprising the modified value. Based on the key, the computer system may add the spurious data sample to the first dataset. In some aspects, based on a request for the first dataset, the computing system may remove the spurious data sample from the first dataset.
Opening claim text (preview).
What is claimed is: 1 . A system for generating more effective spurious data for degrading a machine learning model's performance by modifying values for impactful features within a data sample to prevent malicious actors from using a dataset that includes the spurious data, the system comprising: one or more processors; and a non-transitory, computer readable medium having instructions recorded thereon that, when executed by the one or more processors, cause operations comprising: obtaining a first dataset comprising a set of original data samples, wherein each data sample comprises a label indicating a correct classification for a corresponding data sample; generating a key that indicates a location within the first dataset where spurious data should be stored; generating, based on a first data sample of the set of original data samples, a set of scores, wherein the set of scores comprises a score for each feature in the first data sample, wherein each score in the set of scores indicates an amount of influence a corresponding feature had on output generated by a machine learning model; generating a spurious data sample by modifying a value of the first data sample, wherein the value corresponds to a first feature having more influence than any other feature associated with the first data sample; based on the key, adding the spurious data sample to the first dataset; and based on a request for the first dataset, removing the spurious data sample from the first dataset. 2 . A method for increasing efficacy of spurious data for model degradation, the method comprising: obtaining a first dataset comprising a set of original data samples; generating a key that indicates a location within the first dataset where spurious data should be stored; determining a modified value associated with a first data sample of the set of original data samples, wherein the modified value causes a machine learning model to generate output that does not match a label associated with the first data sample; generating, based on the first data sample, a spurious data sample comprising the modified value; based on the key, adding the spurious data sample to the first dataset; and based on a request for the first dataset, removing the spurious data sample from the first dataset. 3 . The method of claim 2 , wherein determining the modified value comprises: generating, based on a first data sample of the set of original data samples, an explanation indicating a feature that is more influential than other features of the first data sample for output generated by the machine learning model, the output corresponding to the first data sample; and generating the modified value by modifying a value of the first data sample, the value corresponding to the feature. 4 . The method of claim 2 , wherein determining a modified value comprises: determining a modification to a value of a first data sample of the set of original data samples, wherein the modification causes the machine learning model to output an incorrect class of the first data sample; and generating a spurious data sample comprising a label corresponding to the first data sample and a result of the modification. 5 . The method of claim 2 , further comprising: determining a distribution of values for a feature in the first dataset; and generating a first set of spurious data samples, wherein each data sample in the first set of spurious data samples comprises a value for the feature that is more than a threshold number of standard deviations from a mean of the distribution; and adding the first set of spurious data samples to the first dataset. 6 . The method of claim 2 , wherein determining a modified value comprises: generating a set of impact scores, wherein the set of impact scores comprises a plurality of impact scores for each data sample in the set of original data samples, wherein the set of impact scores indicate an amount of influence features had on a machine learning model's output; determining, based on the set of impact scores, a first cluster of impact scores; and determining, based on the first cluster of impact scores, the modified value, wherein the first data sample corresponds to a set of impact scores in the first cluster of impact scores. 7 . The method of claim 2 , wherein removing the spurious data sample from the first dataset comprises: determining a computing device that has experienced more than a threshold amount of cybersecurity attacks within a time period; and based on the computing device having experienced more than the threshold amount of cybersecurity attacks within the time period, removing the spurious data sample from the first dataset after the computing device has completed preprocessing the first dataset. 8 . The method of claim 2 , wherein removing the spurious data sample from the first dataset comprises: determining that the first dataset is to be used to train a machine learning model; and based on determining that the first dataset is to be used to train the machine learning model, removing the spurious data sample from the first dataset. 9 . The method of claim 2 , wherein the key indicates a plurality of rows within the first dataset where spurious data samples should be placed. 10 . The method of claim 2 , further comprising: generating a user interface comprising a first element representative of the first data sample and a second element representative of the spurious data sample; and outputting the user interface. 11 . The method of claim 2 , wherein determining a modified value comprises: determining a second data sample of the set of original data samples; and selecting a value corresponding to a feature of the second data sample. 12 . A non-transitory, computer-readable medium comprising instructions that when executed by one or more processors, cause operations comprising: obtaining a first dataset comprising a set of original data samples; generating a key that indicates a location within the first dataset where spurious data should be stored; determining a modified value associated with a first data sample of the set of original data samples, wherein the modified value causes a machine learning model to generate output that does not match a label associated with the first data sample; generating, based on the first data sample, a spurious data sample comprising the modified value; and based on the key, adding the spurious data sample to the first dataset. 13 . The medium of claim 12 , wherein determining the modified value comprises: generating, based on a first data sample of the set of original data samples, an explanation indicating a feature that is more influential than other features of the first data sample for output generated by the machine learning model, the output corresponding to the first data sample; and generating the modified value by modifying a value of the first data sample, the value corresponding to the feature. 14 . The medium of claim 12 , wherein determining a modified value comprises: determining a modification to a value of a first data sample of the set of original data samples, wherein the modification causes the machine learning model to output an incorrect class of the first data sample; and generating a spurious data sample comprising a label corresponding to the first data sample and a result of the modification. 15 . The medium of claim 12 , further comprising: determining a distribution of values for a feature in the first dataset; generating a first set of spurious data samples, wherein each data sample
Clustering or classification · CPC title
involving long-term monitoring or reporting · CPC title
for performance assessment · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.