Systems and methods of enforcing multi-part policies on data-deficient transactions of cloud computing services
US-2017264640-A1 · Sep 14, 2017 · US
US10270788B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10270788-B2 |
| Application number | US-201615256483-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 2, 2016 |
| Priority date | Jun 6, 2016 |
| Publication date | Apr 23, 2019 |
| Grant date | Apr 23, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The technology disclosed relates to machine learning based anomaly detection. In particular, it relates to constructing activity models on per-tenant and per-user basis using an online streaming machine learner that transforms an unsupervised learning problem into a supervised learning problem by fixing a target label and learning a regressor without a constant or intercept. Further, it relates to detecting anomalies in near real-time streams of security-related events of one or more tenants by transforming the events in categorized features and requiring a loss function analyzer to correlate, essentially through an origin, the categorized features with a target feature artificially labeled as a constant. It further includes determining an anomaly score for a production event based on calculated likelihood coefficients of categorized feature-value pairs and a prevalencist probability value of the production event comprising the coded features-value pairs.
Opening claim text (preview).
What is claimed is: 1. A method of detecting an anomaly event that has not frequently been observed in an ongoing event stream of security-related events of one or more organizations, the method including: implementing loosely supervised machine learning of observed features in security-related events using a loss function analyzer and recording a standard candle, including: transforming training events by assigning the observed features into categorical bins and coding the assigned observed features with a Boolean value as present in their respective categorical bins; analyzing the transformed training events using the loss function analyzer, treating the security-related events as having occurred with certainty, requiring the loss function analyzer to analyze the security-related events by a space identifier (ID), and requiring the loss function analyzer to fit the observed features as transformed essentially through an origin; and producing likelihood coefficients calculated by the space ID and the standard candle; mapping the likelihood coefficients and the standard candle by the space ID into a hash-space, then evaluating a plurality of production events with production space IDs, including for a production event: transforming features of the production event into the categorical bins of the hash-space; applying a hash function to the production space ID and the transformed features of the production event to retrieve the likelihood coefficients for the transformed features of the production event and the standard candle for the production space ID, then calculating an anomaly score; when the anomaly score represents a detected anomaly event, accessing history associated with the production space ID to construct a contrast between feature-event pairs of the detected anomaly event and non-anomalous feature-value pairs of prior events for the production space ID; and invoking one or more security actions including at least one of a quarantine, and an encryption, to be performed when anomalies are detected. 2. The method of claim 1 , wherein the training events are annotated with prevalencist probability values of between 0 to 1, indicative of an occurrence frequency of the events. 3. The method of claim 1 , wherein a prevalencist probability value of 0 indicates previously unseen training events. 4. The method of claim 1 , wherein a prevalencist probability value of 1 indicates frequently appearing training events. 5. The method of claim 2 , further including: storing the likelihood coefficients and the prevalencist probability values for multiple space IDs of an organization in a hash-space of a tenant activity model, indicative of activity habits of users in the organization; and updating the tenant activity model with new events to incorporate changes to the activity habits. 6. The method of claim 2 , further including: storing the likelihood coefficients and the prevalencist probability values for a particular space ID in a hash-space as a user activity model, indicative of activity habits of a user; and updating the user activity model with new events to incorporate changes to the activity habits. 7. The method of claim 2 , further including: determining a relative-error ratio for a particular production event with a production space ID based on a predicted prevalencist probability value of the production event and an observed prevalencist probability value of the production event; determining the standard candle value for the production space ID based on a maximum likelihood coefficient feature-value pair in the production event; evaluating likelihood coefficients of individual feature-value pairs in the production event and determining one or more lowest likelihood coefficient feature-value pairs in the production event; calculating an overall likelihood coefficient for the production event based on the one or more lowest likelihood coefficient feature-value pairs; and determining the production event to be an anomaly event when the relative-error ratio, the standard candle value and the overall likelihood coefficient exceed a threshold. 8. The method of claim 7 , further including distinguishing between a seasoned user and an unseasoned user by: requiring initialization and analysis of a production space ID for a particular user with a value for the standard candle; and maturing the value for the standard candle of the production space ID to a target value responsive to events received for the production space ID at least until a threshold number of events are received. 9. The method of claim 8 , wherein seasoned space IDs have non-zero standard candle values. 10. The method of claim 7 , further including: clustering a plurality of the production events with lowest likelihood coefficient feature-value pairs based on a feature-dimension type; and generating for display clustered production events for different feature-dimension types. 11. The method of claim 1 , wherein the loss function analyzer is a stochastic gradient descent (SGD) analyzer. 12. The method of claim 1 , further including: the likelihood coefficients annotated with corresponding coded feature-value pairs are stored in respective slots of the hash space; and the likelihood coefficients for anomaly detection are retrieved by applying the hash function to the coded feature-value pairs. 13. The method of claim 1 , wherein the security-related events include connection events and application events. 14. The method of claim 1 , further including: accumulating non-zero likelihood coefficients for frequently appearing feature-value pairs; updating likelihood coefficients of individual feature-value pairs during correlation; and converging over time the likelihood coefficients of the frequently appearing feature-value pairs to match likelihood coefficients of a target feature. 15. The method of claim 1 , further including: user-specific activity habits are learned based on separate analysis of sub-streams by the space ID; and are persisted in a hash-space separate user-states based on the user-specific activity habits as learned, representing occurrence frequencies of all past events for individual users. 16. The method of claim 1 , further including updating tenant and user activity models over time, including maturing and storing frequently occurring anomalous events as normal user activity. 17. The method of claim 1 , wherein features include: one or more time dimensions; a source location dimension; a source Internet Protocol (IP) address dimension; a destination location dimension; a destination IP address dimension; and a source device identity dimension. 18. The method of claim 1 , wherein features include: an application used dimension; an activity type and detail dimension; and a manipulated object dimension. 19. The method of claim 1 , wherein one or more time-based features of the training events are assigned into multiple sets of periodic bins with varying granularity. 20. The method of claim 19 , further including the time-based features of the training events are assigned into at least one: day-of-week periodic bin with 7 distinct values; time-of-day periodic bin with 24 distinct values; and part-of-day periodic bin with 6 distinct values. 21. The method of claim 1 , further including generating for display the anomaly event in naturally processed language. 22. The method of claim 1 , further including: storing a set of code
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Machine learning · CPC title
to a single file or object, e.g. in a secure envelope, encrypted and accessed using a key, or with access control rules appended to the object itself · CPC title
Knowledge representation; Symbolic representation · CPC title
involving event detection and direct action · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.