Automatically determining whether an activation cluster contains poisonous data
US-11487963-B2 · Nov 1, 2022 · US
US12462018B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12462018-B2 |
| Application number | US-202218147773-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2022 |
| Priority date | Dec 29, 2022 |
| Publication date | Nov 4, 2025 |
| Grant date | Nov 4, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for managing artificial intelligence (AI) models are disclosed. To manage AI models, AI models may be updated over time to obtain updated AI model instances. Following each update process, the updated instance of the AI model may be analyzed to determine whether poisoned training data was used to update the AI model. To perform the analysis, characteristics associated with the updated instance of the AI model may be compared to characteristics of the previous instance of the AI model. If the characteristics of the updated instance of the AI model differ from the characteristics of the previous instance of the AI model by an amount dictated by a threshold, the training data used to obtain the updated instance of the AI model may be treated as including poisoned training data.
Opening claim text (preview).
What is claimed is: 1 . A method for managing an artificial intelligence (AI) model, the method comprising: obtaining a second instance of the AI model, the second instance of the AI model comprising a first portion being trained using a known good set of training data and a second portion being trained using a suspect set of training data, and obtaining the second instance of the AI model comprises: performing a transfer learning process using the suspect set of training data and a first instance of the AI model to obtain the second instance of the AI model, the first instance of the AI model previously being trained, at least in part, using the known good set of training data, and the transfer learning process comprises: obtaining the first instance of the AI model; freezing a first portion of the first instance of the AI model to obtain a partially frozen AI model; and training the partially frozen AI model using the suspect set of training data to obtain the second instance of the AI model; performing an analysis of the second instance of the AI model to obtain a quantification, the quantification indicating a likelihood that the suspect set of training data comprises poisoned training data; making a determination regarding whether the quantification exceeds a quantification threshold; in a first instance of the determination in which the quantification exceeds the quantification threshold: treating the suspect set of training data as comprising the poisoned training data; and in a second instance of the determination in which the quantification does not exceed the quantification threshold: treating the suspect set of training data as not comprising the poisoned training data. 2 . The method of claim 1 , wherein training the partially frozen AI model comprises: modifying weights associated with a second portion of the first instance of the AI model. 3 . The method of claim 2 , wherein the first portion of the first instance of the AI model is identical to the first portion of the second instance of the AI model. 4 . The method of claim 1 , wherein performing the analysis of the second instance of the AI model comprises: obtaining a snapshot of the first instance of the AI model; obtaining a historical training data set, the historical training data set comprising known good training data; obtaining a first set of outputs using the first instance of the AI model and a set of inputs, the set of inputs comprising at least a portion of the historical training data set; obtaining a second set of outputs using the second instance of the AI model and the set of inputs, the second set of outputs and the second instance of the AI model being unlikely to be tainted by the poisoned training data set when the second set of outputs matches the first set of outputs within the quantification threshold; and obtaining the quantification using the first set of outputs and the second set of outputs. 5 . The method of claim 1 , wherein performing the analysis of the second instance of the AI model comprises: obtaining a snapshot of the first instance of the AI model; obtaining first weights associated with the snapshot of the first instance of the AI model; obtaining second weights associated with a snapshot of the second instance of the AI model; obtaining the quantification using the first weights and second weights. 6 . The method of claim 5 , wherein obtaining the quantification comprises: identifying a level of decision boundary adaptation based on the first weights and the second weights; and obtaining the quantification based on the level of decision boundary adaptation. 7 . The method of claim 1 , wherein making the determination comprises: obtaining the quantification threshold, the quantification threshold indicating a magnitude of deviation of characteristics of the second instance of the AI model from characteristics of the first instance of the AI model in a positive or a negative direction, and any deviation exceeding the quantification threshold being considered to indicate potentially poisoned training data; and comparing the quantification to the quantification threshold. 8 . The method of claim 1 , wherein treating the suspect set of training data as comprising the poisoned training data comprises one selected from a list of actions consisting of: treating the suspect set of training data as being part of a malicious attack; remediating an impact of the suspect set of training data on the second instance of the AI model; discarding the suspect set of training data; identifying a data source of the suspect set of training data; and treating the data source of the suspect set of training data as a potentially malicious data source. 9 . The method of claim 8 , wherein remediating the impact of the suspect set of training data comprises: identifying the second portion of the second instance of the AI model as a poisoned portion of the second instance of the AI model; remediating the poisoned portion to obtain an un-poisoned portion; and obtaining an un-poisoned instance of the AI model using the un-poisoned portion. 10 . A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing an artificial intelligence (AI) model, the operations comprising: obtaining a second instance of the AI model, the second instance of the AI model comprising a first portion being trained using a known good set of training data and a second portion being trained using a suspect set of training data, and obtaining the second instance of the AI model comprises: performing a transfer learning process using the suspect set of training data and a first instance of the AI model to obtain the second instance of the AI model, the first instance of the AI model previously being trained, at least in part, using the known good set of training data, and the transfer learning process comprises: obtaining the first instance of the AI model; freezing a first portion of the first instance of the AI model to obtain a partially frozen AI model; and training the partially frozen AI model using the suspect set of training data to obtain the second instance of the AI model; performing an analysis of the second instance of the AI model to obtain a quantification, the quantification indicating a likelihood that the suspect set of training data comprises poisoned training data; making a determination regarding whether the quantification exceeds a quantification threshold; in a first instance of the determination in which the quantification exceeds the quantification threshold: treating the suspect set of training data as comprising the poisoned training data; and in a second instance of the determination in which the quantification does not exceed the quantification threshold: treating the suspect set of training data as not comprising the poisoned training data. 11 . The non-transitory machine-readable medium of claim 10 , wherein training the partially frozen AI model comprises: modifying weights associated with a second portion of the first instance of the AI model. 12 . The non-transitory machine-readable medium of claim 11 , wherein the first portion of the first instance of the AI model is identical to the first portion of the second instance of the AI model. 13 . The non-transitory machine-readable medium of claim 10 , wherein performing the analysis of the second instance of the AI model comprises: obtaining a snapshot of the first instance of the AI model; obtaining a historical training d
Related publications grouped by family.
Answers are generated from the same data shown on this page.