System and method for knowledge distillation
US-2021097400-A1 · Apr 1, 2021 · US
US12572650B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12572650-B2 |
| Application number | US-202218147763-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2022 |
| Priority date | Dec 29, 2022 |
| Publication date | Mar 10, 2026 |
| Grant date | Mar 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for managing an attack on an artificial intelligence (AI) model using view level analysis are disclosed. As AI models are updated over time using new training data, snapshots of the AI models may be obtained. The snapshots may include information regarding the training data used to train the AI model, the parameters of the AI model, and/or the inferences obtained from the AI model. A malicious party may perform an attack on the AI model by introducing poisoned training data through a data source. The content of supplied poisoned training data may be determined based on a view level into the AI model. The view level of the malicious party may be used to design countermeasures to mitigate and/or prevent future attacks to the AI model.
Opening claim text (preview).
What is claimed is: 1 . A method for managing an artificial intelligence (AI) model, comprising: identifying a poisoned training dataset that was used to train an AI model instance; and after identifying the poisoned training dataset: identifying, based at least in part on the poisoned training dataset, a portion of a snapshot of the AI model instance that a malicious party had visibility into when selecting content of the poisoned training dataset by performing at least a training data view analysis to obtain a training data view analysis result, the training data view analysis comprising: attempting to match a portion of the poisoned training dataset to a portion of a first ingest dataset obtained from a data source that did not provide the poisoned training dataset in order to identify a quantity of the poisoned training dataset that matches the first ingest dataset; and making a first determination, using the quantity and a criterion associated with the quantity, regarding whether the malicious party had visibility into the first ingest dataset from the portion of the snapshot; classifying an amount of information the malicious party has regarding the AI model instance based on the portion of the snapshot to obtain a view level classification, the view level classification being used to identify one or more systems that are compromised; and performing a remediation based on the view level classification. 2 . The method of claim 1 , wherein identifying the portion of the snapshot further comprises: performing a model weight view analysis to obtain a model weight view analysis result, the model weight view analysis being based on: the poisoned training dataset, a second ingest dataset from a data source that provided the poisoned training dataset, and inferences generated using the second ingest dataset. 3 . The method of claim 2 , wherein performing the model weight view analysis comprises: enumerating the poisoned training dataset to identify a first cardinality of a portion of the second ingest dataset; enumerating the second ingest dataset to identify a second cardinality of the second ingest dataset; obtaining, using the first cardinality and the second cardinality, a quantification statistic; and based on the quantification statistic and a criterion for the quantification statistic, making a second determination regarding whether the malicious party had visibility into a model weight from the portion of the snapshot. 4 . The method of claim 3 , wherein the model weight view analysis result indicates: in a first instance of the second determination where the quantification statistic satisfies the criterion: that the malicious party had visibility of the model weight while selecting content of the poisoned training dataset; and in a second instance of the second determination where the quantification statistic fails to satisfy the criterion: that the malicious party did not have visibility of the model weight while selecting the content of the poisoned training dataset. 5 . The method of claim 4 , wherein the training data view analysis is based on: the poisoned training dataset, the first ingest dataset, and inferences generated using the first ingest dataset. 6 . The method of claim 5 , wherein performing the training data view analysis further comprises: obtaining, using the quantity of the poisoned training dataset and before making the first determination, a second quantification statistic, wherein the first determination is made based on the second quantification statistic and the criterion associated with the quantity is a second criterion for the second quantification statistic. 7 . The method of claim 6 , wherein the training data view analysis result indicates: in a first instance of the first determination where the second quantification statistic satisfies the second criterion: that the malicious party had visibility of the first ingest dataset and the inferences generated using the first ingest dataset while selecting the content of the poisoned training dataset; and in a second instance of the first determination where the second quantification statistic fails to satisfy the second criterion: that the malicious party did not have visibility of the first ingest dataset and the inferences generated using the first ingest dataset while selecting the content of the poisoned training dataset. 8 . The method of claim 1 , wherein performing the remediation based on the view level classification comprises: performing an action set to secure the one or more systems that are compromised. 9 . A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing an artificial intelligence (AI) model, the operations comprising: identifying a poisoned training dataset that was used to train an AI model instance; and after identifying the poisoned training dataset: identifying, based at least in part on the poisoned training dataset, a portion of a snapshot of the AI model instance that a malicious party had visibility into when selecting content of the poisoned training dataset by performing at least a training data view analysis to obtain a training data view analysis result, the training data view analysis comprising: attempting to match a portion of the poisoned training dataset to a portion of a first ingest dataset obtained from a data source that did not provide the poisoned training dataset in order to identify a quantity of the poisoned training dataset that matches the first ingest dataset; and making a first determination, using the quantity and a criterion associated with the quantity, regarding whether the malicious party had visibility into the first ingest dataset from the portion of the snapshot; classifying an amount of information the malicious party has regarding the AI model instance based on the portion of the snapshot to obtain a view level classification, the view level classification being used to identify one or more systems that are compromised; and performing a remediation based on the view level classification. 10 . The non-transitory machine-readable medium of claim 9 , wherein identifying the portion of the snapshot further comprises: performing a model weight view analysis to obtain a model weight view analysis result, the model weight view analysis being based on: the poisoned training dataset, a second ingest dataset from a data source that provided the poisoned training dataset, and inferences generated using the second ingest dataset. 11 . The non-transitory machine-readable medium of claim 10 , wherein performing the model weight view analysis comprises: enumerating the poisoned training dataset to identify a first cardinality of a portion of the second ingest dataset; enumerating the second ingest dataset to identify a second cardinality of the second ingest dataset; obtaining, using the first cardinality and the second cardinality, a quantification statistic; and based on the quantification statistic and a criterion for the quantification statistic, making a second determination regarding whether the malicious party had visibility into a model weight from the portion of the snapshot. 12 . The non-transitory machine-readable medium of claim 11 , wherein the model weight view analysis result indicates: in a first instance of the second determination where the quantification statistic satisfies the criterion: that the malicious party had visibility of the model weight while selecting content of the poisoned training dataset; and in a se
Inference or reasoning models · CPC title
involving event detection and direct action · CPC title
Test or assess software · CPC title
Machine learning · CPC title
Computer malware detection or handling, e.g. anti-virus arrangements · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.