Managing data drift in machine learning models using incremental learning and explainability
US-2024119290-A1 · Apr 11, 2024 · US
US12306938B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12306938-B2 |
| Application number | US-202318170502-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 16, 2023 |
| Priority date | Feb 16, 2023 |
| Publication date | May 20, 2025 |
| Grant date | May 20, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In some aspects, a computing system obtain a first dataset including a set of original data samples and a first set of spurious data samples. Based on a time period expiring, the computing system may replace the first set of spurious data samples in the first dataset with a second set of spurious data samples. The computing system may obtain an indication that a second dataset is available via a third-party computing device. Based on a determination that a subset of samples of the second dataset correspond to the first set of spurious data samples, the computing system may determine a time window in which an incident occurred. As an example, the time window may be determined to correspond to a time before the first set of spurious data samples were replaced with the second set of spurious data samples.
Opening claim text (preview).
What is claimed is: 1. A system for using spurious data samples in a dataset to determine a time window during which a malicious device caused a cybersecurity incident, the system comprising: one or more processors; and a non-transitory, computer readable medium having instructions recorded thereon that, when executed by the one or more processors, cause operations comprising: obtaining a first dataset comprising a set of original data samples and a first set of spurious data samples, wherein spurious data samples of the first set of spurious data samples are stored at locations, identifiable by a key, within the first dataset, wherein the first set of spurious data samples are configured to decrease accuracy of a machine learning model by more than a threshold percentage amount; based on a time period expiring, replacing the first set of spurious data samples in the first dataset with a second set of spurious data samples; obtaining an indication that a second dataset is available via a third-party computing device; determining that a subset of samples of the second dataset match the first set of spurious data samples; based on the subset of samples of the second dataset matching the first set of spurious data samples determining a time window in which a cybersecurity incident occurred, wherein the time window corresponds to a time before the first set of spurious data samples were replaced with the second set of spurious data samples; and outputting an indication of the time window. 2. The system of claim 1 , wherein replacing the first set of spurious data samples in the first dataset with the second set of spurious data samples comprises: determining a first percentage corresponding to a number of data samples in the first dataset that belong to the first set of spurious data samples; determining a second percentage of the first dataset; generating the second set of spurious data samples, wherein the number of data samples in the second set of spurious data samples corresponds to the second percentage; and based on generating the second set of spurious data samples, replacing the first set of spurious data samples with the second set of spurious data samples. 3. The system of claim 1 , wherein the instructions, when executed, cause operations further comprising: determining a setting of a computing device associated with the cybersecurity incident, wherein the setting was active during the time window; and based on the setting, generating a recommendation for a modified setting, wherein the modified setting is predicted to prevent the cybersecurity incident from repeating. 4. The system of claim 1 , wherein the instructions, when executed, cause operations further comprising: determining a software version of software associated with the cybersecurity incident, wherein the software version of the software was installed during the time window; and based on the software version, generating a recommendation. 5. A method for using spurious data samples in a dataset to determine a time window during which a malicious actor caused a cybersecurity incident, the method comprising: obtaining a first dataset comprising a set of original data samples and a first set of spurious data samples; based on a time period expiring, replacing the first set of spurious data samples in the first dataset with a second set of spurious data samples; obtaining an indication that a second dataset is available via a third-party computing device; based on determining that a subset of samples of the second dataset correspond to the first set of spurious data samples, determining a time window in which a cybersecurity incident occurred, wherein the time window corresponds to a time before the first set of spurious data samples were replaced with the second set of spurious data samples; and outputting an indication of the time window. 6. The method of claim 5 , wherein replacing the first set of spurious data samples in the first dataset with the second set of spurious data samples comprises: determining a first percentage corresponding to a number of data samples in the first dataset that belong to the first set of spurious data samples; determining a second percentage of the first dataset; generating the second set of spurious data samples, wherein the number of data samples in the second set of spurious data samples corresponds to the second percentage; and based on generating the second set of spurious data samples, replacing the first set of spurious data samples with the second set of spurious data samples. 7. The method of claim 5 , further comprising: determining a setting of a computing device associated with the cybersecurity incident, wherein the setting was active during the time window; and based on the setting, generating a recommendation for a modified setting, wherein the modified setting is predicted to prevent the cybersecurity incident from repeating. 8. The method of claim 5 , further comprising: determining a software version of software associated with the cybersecurity incident, wherein the software version of the software was installed during the time window; and based on the software version, generating a recommendation. 9. The method of claim 5 , wherein determining that the subset of samples of the second dataset corresponds to the first set of spurious data samples comprises: generating a first hash of the subset of samples; and based on the first hash matching a second hash associated with the first set of spurious data samples, determining that the subset of samples corresponds to the first set of spurious data samples. 10. The method of claim 5 , further comprising steps for generating a spurious data sample. 11. The method of claim 5 , further comprising: based on replacing the first set of spurious data samples, storing an identifier associated with the second set of spurious data samples, wherein the identifier comprises an embedding of the second set of spurious data samples. 12. The method of claim 5 , wherein replacing the first set of spurious data samples in the first dataset with the second set of spurious data samples comprises: determining a second key indicative of locations within the first dataset that are different from the locations of samples of the first set of spurious data samples within the first dataset; and based on the locations indicated by the second key, adding the second set of spurious data samples to the first dataset. 13. A non-transitory, computer-readable medium comprising instructions that when executed by one or more processors, cause operations comprising: obtaining a first dataset comprising a set of original data samples and a first set of spurious data samples; based on a time period expiring, replacing the first set of spurious data samples in the first dataset with a second set of spurious data samples; obtaining an indication that a second dataset is available via a third-party computing device; based on determining that a subset of samples of the second dataset correspond to the first set of spurious data samples determining a time window in which a cybersecurity incident occurred; and outputting an indication of the time window. 14. The medium of claim 13 , wherein replacing the first set of spurious data samples in the first dataset with the second set of spurious data samples comprises: determining a first percentage corresponding to a number of data samples in the first dataset that belong to the first set of spurious data samples; determining a second percentage of the first dataset; generating the second set of spurious data
Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title
Test or assess a computer or a system · CPC title
involving event detection and direct action · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.