Anomalous Data Detection in Computer Based Reasoning and Artificial Intelligence Systems
US-2020371512-A1 · Nov 26, 2020 · US
US11262742B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11262742-B2 |
| Application number | US-202016992842-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 13, 2020 |
| Priority date | Apr 9, 2018 |
| Publication date | Mar 1, 2022 |
| Grant date | Mar 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided herein for creating well-balanced computer-based reasoning systems and using those to control systems. The techniques include receiving a request to determine whether to use one or more particular data elements, features, cases, etc. in a computer-based reasoning model (e.g., as data elements, cases or features are being added, or as part of pruning existing features or cases). Conviction measures are determined and inclusivity conditions are tested. The result of comparing the conviction measure can be used to determine whether to include or exclude the feature, case, etc. in the model and/or whether there are anomalies in the model. A controllable system may then be controlled using the computer-based reasoning model. Examples controllable systems include self-driving cars, image labeling systems, manufacturing and assembly controls, federated systems, smart voice controls, automated control of experiments, energy transfer systems, health care systems, cybersecurity systems, and the like.
Opening claim text (preview).
What is claimed is: 1. A method comprising: training a computer-based reasoning model; receiving a request to determine whether one or more particular data elements in the computer-based reasoning model are anomalous; determining for each of the one or more particular data elements, one or more conviction scores, wherein determining one or more conviction scores for the one or more particular data elements comprises determining a familiarity conviction score for the one or more particular data elements and determining a distance contribution score for the one or more particular data elements; wherein: the familiarity conviction score is a measure of how much the one or more particular data elements distort a model calculated as a function of one or more measures of distribution similarity, and the distance contribution score is a locally weighted expected value of the distance from one point to its nearest neighbors calculated based on a function of similarity between the one or more particular data elements and neighboring data elements of the one or more particular data elements; determining whether the one or more conviction scores meet one or more anomalousness conditions; in response to determining that the one or more conviction scores meet the one or more anomalousness conditions, sending an alert to a second system that the one or more particular data elements in the computer-based reasoning model are anomalous; wherein determining whether the one or more conviction scores meet the anomalousness conditions comprises determining that the one or more particular data elements meet the anomalousness condition when the familiarity conviction score is beyond a first threshold and the distance contribution score is beyond a second threshold, wherein the method is performed on one or more computing devices. 2. The method of claim 1 , further comprising: including the one or more particular data elements in the computer-based reasoning model when the one or more anomalousness conditions is not met; causing, with a control system, control of a controllable system with the computer-based reasoning model. 3. The method of claim 1 , wherein determining whether the one or more conviction scores meet the anomalousness conditions comprises determining that the one or more particular data elements meet the one or more anomalousness conditions when the familiarity conviction score is below a first threshold and the distance contribution score is below a second threshold. 4. The method of claim 1 , further comprising, in response to determining that the one or more conviction scores meet the one or more anomalousness conditions, excluding the one or more particular data elements in the computer-based reasoning model. 5. The method of claim 1 , wherein receiving a request to determine whether to include one or more particular data elements comprises receiving a request to reduce the computer-based reasoning model to a particular size; and the method further comprises: determining a number of data elements to include in the computer-based reasoning model to reduce the computer-based reasoning model to a particular size; determining a subset of data elements to include, that includes the number of data elements, to include in the computer-based reasoning model based at least in part on the one or more conviction scores for data elements in the computer-based reasoning model; and including only the subset of data elements to include in the computer-based reasoning model, and excluding data elements from the computer-based reasoning model that are not in the subset of data elements to include. 6. The method of claim 1 , further comprising: initially receiving the one or more particular data elements as part of training for the computer-based reasoning model; in response to determining that the one or more conviction scores meet the one or more anomalousness conditions, sending an indication to a trainer associated with the training for the computer-based reasoning model that training related to the one or more particular data elements is anomalous. 7. The method of claim 1 , further comprising: receiving a request for an action to take in a current context associated with the one or more particular data elements; when the one or more anomalousness conditions is not met by the one or more conviction scores associated with the one or more particular data elements: determining the action to take based on comparing the current context to contexts associated with cases in the computer-based reasoning model; and responding to the request for the action to take with the determined action. 8. The method of claim 1 , further comprising: receiving a request for an action to take in a current context associated with the one or more particular data elements; when the one or more anomalousness conditions is met by the one or more conviction scores associated with the one or more particular data elements: removing the one or more particular data elements associated with the one or more convictions scores that met the one or more anomalousness conditions. 9. A system for executing instructions, wherein said instructions are instructions which, when executed by one or more computing devices, cause performance of a process including: training a computer-based reasoning model; receiving a request to determine whether one or more particular data elements in the computer-based reasoning model are anomalous; determining for each of the one or more particular data elements, one or more conviction scores, wherein determining one or more conviction scores for the one or more particular data elements comprises determining a familiarity conviction score for the one or more particular data elements and determining a distance contribution score for the one or more particular data elements; wherein: the familiarity conviction score is a measure of how much the one or more particular data elements distort a model calculated as a function of one or more measures of distribution similarity, and the distance contribution score is a locally weighted expected value of the distance from one point to its nearest neighbors calculated based on a function of similarity between the one or more particular data elements and neighboring data elements of the one or more particular data elements; determining whether the one or more conviction scores meet one or more anomalousness conditions; in response to determining that the one or more conviction scores meet the one or more anomalousness conditions, sending an alert to a second system that the one or more particular data elements in the computer-based reasoning model are anomalous; wherein determining whether the one or more conviction scores meet the anomalousness conditions comprises determining that the one or more particular data elements meet the anomalousness condition when the familiarity conviction score is beyond a first threshold and the distance contribution score is beyond a second threshold, wherein the process is performed on one or more computing devices. 10. The system of claim 9 , the process further comprising: including the one or more particular data elements in the computer-based reasoning model when the one or more anomalousness conditions is not met; causing, with a control system, control of a controllable system with the computer-based reasoning model. 11. The system of claim 9 , wherein determining whether the one or more conviction scores meet the anomalousness conditions comprises determining that the one or more particular data elements meet the one or more anomalousness conditions when the familiarity conviction score
Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Matching criteria, e.g. proximity measures · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.