Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures

US10721268B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10721268-B2
Application numberUS-201916239081-A
CountryUS
Kind codeB2
Filing dateJan 3, 2019
Priority dateMar 15, 2013
Publication dateJul 21, 2020
Grant dateJul 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, tax fraud detection, beaconing malware detection, malware user-agent detection, and/or activity trend detection, among various others.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: one or more computer readable storage devices configured to store: host-based events associated with one or more computing devices; and activity trend-related data items; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute computer executable instructions to cause the computer system to: execute a cluster engine configured to at least: determine a first group of host-based events that indicate a same first activity type and are associated with a first host and a reference time period; determine, based at least on the first group of host-based events, a first statistical deviation in the first activity type on the first host for the reference time period; determine a second group of host-based events the indicate the same first activity type and are associated with the first host and a test time period; determine, based at least on the second group of host-based events, a second statistical deviation in the first activity type on the first host for the test time period; in response to determining that the first statistical deviation compared to the second statistical deviation satisfies a particular threshold, designate a host-based event from the second group as a seed; generate a data item cluster based on the seed, wherein generating the data item cluster comprises: adding the seed to the data item cluster; and adding to the data item cluster one or more activity trend-related data items, from the activity trend-related data items, determined to be associated with the seed; and determine scores for the data item cluster and a plurality of additional data items clusters generated based on host-based events; and execute a workflow engine configured to at least: cause presentation of the data item cluster and the plurality of additional data item clusters in a user interface of a client computing device; and order the presented data item cluster and the plurality of additional data item clusters in the user interface based at least in part on the respective determined scores for the data item cluster and the plurality of additional data item clusters. 2. The computer system of claim 1 , wherein the activity trend-related data items include at least one of: data items associated with captured host-based events, Internet Protocol addresses, external domains, users, or computing devices, and wherein hosts comprise computing devices in a network. 3. The computer system of claim 1 , wherein the one or more hardware computer processors are configured to execute the computer executable instructions to further cause the computer system to: execute the cluster engine further configured to at least: identify the one or more activity trend-related data items determined to be associated with the seed based at least on a clustering strategy, wherein the clustering strategy queries the host-based events and/or the activity trend-related data items to determine at least one of: the particular host associated with the seed, one or more host-based events associated with the particular host, one or more host-based events associated with the seed, users of the particular host, data items associated with the particular host, other hosts associated with the same particular activity type of host-based events, Internet Protocol addresses associated with the particular host, external domains associated with the seed, or computing devices associated with the particular host. 4. The computer system of claim 3 , wherein identifying one or more activity trend-related data items determined to be associated with the seed further comprises determining a particular activity trend-related data item and the seed are both associated with a common metadata property value. 5. The computer system of claim 4 , wherein the common property value includes at least one of: a username, a domain, an Internet Protocol address, a computing device identifier, or an event identifier. 6. The computer system of claim 1 , wherein the first statistical deviation comprises a Z-score. 7. A computer-implemented method comprising: by one or more processors executing program instructions: executing a cluster engine configured to at least: access one or more computer readable storage devices configured to store: host-based events associated with one or more computing devices; and activity trend-related data items; determine a first group of host-based events that indicate a same first activity type and are associated with a first host and a reference time period; determine, based at least on the first group of host-based events, a first statistical deviation in the first activity type on the first host for the reference time period; determine a second group of host-based events the indicate the same first activity type and are associated with the first host and a test time period; determine, based at least on the second group of host-based events, a second statistical deviation in the first activity type on the first host for the test time period; in response to determining that the first statistical deviation compared to the second statistical deviation satisfies a particular threshold, designate a host-based event from the second group as a seed; generate a data item cluster based on the seed, wherein generating the data item cluster comprises: adding the seed to the data item cluster; and adding to the data item cluster one or more activity trend-related data items, from the activity trend-related data items, determined to be associated with the seed; and determine scores for the data item cluster and a plurality of additional data items clusters generated based on host-based events; and executing a workflow engine configured to at least: cause presentation of the data item cluster and the plurality of additional data item clusters in a user interface of a client computing device; and order the presented data item cluster and the plurality of additional data item clusters in the user interface based at least in part on the respective determined scores for the data item cluster and the plurality of additional data item clusters. 8. The computer-implemented method of claim 7 , wherein the activity trend-related data items include at least one of: data items associated with captured host-based events, Internet Protocol addresses, external domains, users, or computing devices, and wherein hosts comprise computing devices in a network. 9. The computer-implemented method of claim 7 further comprising: by the one or more processors executing program instructions: executing the cluster engine further configured to at least: identify the one or more activity trend-related data items determined to be associated with the seed based at least on a clustering strategy, wherein the clustering strategy queries the host-based events and/or the activity trend-related data items to determine at least one of: the particular host associated with the seed, one or more host-based events associated with the particular host, one or more host-based events associated with the seed, users of the particular host, data items associated with the particular host, other hosts associated with the same particular activity type of host-based events, Internet Protocol addresses associated with the particular host, external domains associated with the seed, or computing devices associated with the particular host. 10. The computer-implemented method of claim 9 , wherein identifying one or more activity trend-related data items determined to be associated with the seed further comprises determining a particular

Assignees

Inventors

Classifications

  • Credit; Loans; Processing thereof · CPC title

  • G06Q40/00Primary

    Finance; Insurance; Tax strategies; Processing of corporate or income taxes · CPC title

  • Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title

  • Visualization; Browsing · CPC title

  • Creation or modification of classes or clusters · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10721268B2 cover?
In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06Q40/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).