Identity resolution in data intake stage of machine data processing platform

US9838410B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9838410-B2
Application numberUS-201514928985-A
CountryUS
Kind codeB2
Filing dateOct 30, 2015
Priority dateAug 31, 2015
Publication dateDec 5, 2017
Grant dateDec 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A security platform employs a variety techniques and mechanisms to detect security related anomalies and threats in a computer network environment. The security platform is “big data” driven and employs machine learning to perform security analytics. The security platform performs user/entity behavioral analytics (UEBA) to detect the security related anomalies and threats, regardless of whether such anomalies/threats were previously known. The security platform can include both real-time and batch paths/modes for detecting anomalies and threats. By visually presenting analytical results scored with risk ratings and supporting evidence, the security platform enables network security administrators to respond to a detected anomaly or threat, and to take action promptly.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving event data representing a plurality of events on a computer network; identifying a plurality of entities involved in the events, the plurality of entities including a particular user represented by a user identifier in the event data and a machine represented by a machine identifier in the event data; determining a probability of association between the machine identifier and the particular user, based on the event data; detecting that the probability of association satisfies a predetermined criterion; in response to detecting that the probability of association satisfies the predetermined criterion, creating a user association record indicative that a particular event represented in the event data is associated with the particular user; and annotating raw machine data of the particular event to include an indication of the particular user, based on the user association record. 2. The method of claim 1 , wherein the predetermined criterion comprises the probability of association exceeding a confidence threshold. 3. The method of claim 1 , wherein the user association record is created regardless of whether the particular event includes the user identifier. 4. The method of claim 1 , wherein the user association record is created when the particular event includes the machine identifier. 5. The method of claim 1 , wherein the user association record is created when the particular event includes the machine identifier but not the user identifier. 6. The method of claim 1 , wherein the user association record is created when the particular event is received during a valid time period. 7. The method of claim 1 , wherein said determining step comprises: creating a probabilistic graph to generate and track the probability of association between the particular user and the machine identifier, wherein a result from the probabilistic graph has a time-based dependence on current and past inputs. 8. The method of claim 1 , wherein said determining step comprises: creating a probabilistic graph to record the probability of association between the particular user and the machine identifier, wherein the probabilistic graph includes a peripheral node, a center node, and an edge, the peripheral node representing the machine identifier, the center node representing the particular user, and the edge representing the probability of association between the machine identifier and the particular user. 9. The method of claim 1 , wherein said determining step comprises: creating a probabilistic graph to record the probability of association between the particular user and the machine identifier, wherein the probabilistic graph is in the form of a stored data structure, and wherein the stored data structure is configured to include additional machine identifiers. 10. The method of claim 1 , further comprising: updating the probability of association upon receiving event data representing a new event having at least one of: the machine identifier or the user identifier. 11. The method of claim 1 , further comprising: updating the probability of association upon receiving event data representing a new event having at least one of: the machine identifier or the user identifier; wherein the new event comprises an authentication event that includes the user identifier. 12. The method of claim 1 , further comprising: updating the probability of association upon receiving event data representing a new event having at least one of: the machine identifier or the user identifier; wherein the new event comprises an authentication event that includes the user identifier, and wherein said updating step assigns a different weight to the new event based on a type of authentication event. 13. The method of claim 1 , further comprising: updating the probability of association upon receiving event data representing a new event having at least one of: the machine identifier or the user identifier; wherein the new event comprises an authentication event that includes the user identifier, wherein said updating step assigns more weight to a physical login type of authentication event than to any other type of authentication event. 14. The method of claim 1 , further comprising: creating, by a machine learning model, a probabilistic graph to record the probability of association. 15. The method of claim 1 , wherein the event data on which said determining step is performed is limited to events that have occurred during a life time of a particular version of a machine learning model that is used to generate and track the probability of association. 16. The method of claim 1 , wherein the event data representing the plurality of events is received in an order different from a temporal order of the events. 17. The method of claim 1 , further comprising: sending the user association record to a cache server. 18. The method of claim 1 , further comprising: sending the user association record to a cache server that stores structured data, wherein the user association record is stored in the cache server using a data structure representing a probability of association between the particular user and each of a plurality of machine identifiers. 19. The method of claim 1 , wherein the event data further includes a second machine identifier, the method further comprising: determining a probability of association between the machine identifier and the second machine identifier, based on the event data. 20. The method of claim 1 , wherein the event data further includes a second machine identifier, the method further comprising: determining a probability of machine association between the machine identifier and the second machine identifier, based on the event data; and upon the probability of machine association satisfying a second predetermined criterion, creating a machine association record indicative that a particular event having the second machine identifier is associated with the machine identifier. 21. The method of claim 1 , further comprising: resolving a user identity of the particular user by querying, using the user identifier as a key, a database having records indicating a plurality of user identifiers registered to the user identity. 22. The method of claim 1 , wherein the machine identifier comprises at least one of: a media access control (MAC) address or an Internet Protocol (IP) address. 23. The method of claim 1 , wherein the user identifier comprises at least one of: a user login identifier (ID), a username, or an electronic mail address. 24. The method of claim 1 , wherein identifying the entities in the events comprises: parsing the event data based on a predetermined data format that specifies which data represent entities in the events. 25. The method of claim 1 , wherein said identifying the entities further comprises: detecting a data format of the event data. 26. The method of claim 1 , wherein said identifying the entities further comprises: detecting a data format of the event data by steps including: comparing the data format of the event data to a list of known event data formats; and determining a highest probability data format based on a result of said comparing step. 27. A computer system comprising: a communication device; and a processor configured to: rec

Assignees

Inventors

Classifications

  • G06N20/20Primary

    Ensemble learning · CPC title

  • Event detection, e.g. attack signature detection · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Hyperlinking · CPC title

  • using ranking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9838410B2 cover?
A security platform employs a variety techniques and mechanisms to detect security related anomalies and threats in a computer network environment. The security platform is “big data” driven and employs machine learning to perform security analytics. The security platform performs user/entity behavioral analytics (UEBA) to detect the security related anomalies and threats, regardless of whether…
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).