Phishing data item clustering and analysis

US11546364B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11546364-B2
Application numberUS-202017003398-A
CountryUS
Kind codeB2
Filing dateAug 26, 2020
Priority dateJul 3, 2014
Publication dateJan 3, 2023
Grant dateJan 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to a data analysis system that may automatically generate memory-efficient clustered data structures, automatically analyze those clustered data structures, and provide results of the automated analysis in an optimized way to an analyst. The automated analysis of the clustered data structures (also referred to herein as data clusters) may include an automated application of various criteria or rules so as to generate a compact, human-readable analysis of the data clusters. The human-readable analyses (also referred to herein as “summaries” or “conclusions”) of the data clusters may be organized into an interactive user interface so as to enable an analyst to quickly navigate among information associated with various data clusters and efficiently evaluate those data clusters in the context of, for example, a fraud investigation. Embodiments of the present disclosure also relate to automated scoring of the clustered data structures.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: one or more computer readable storage devices configured to store: a plurality of computer executable instructions; a data clustering strategy; and a plurality of data items including at least: a plurality of email data items, each of the plurality of email data items including at least a subject and a sender, each of the plurality of email data items potentially associated with phishing activity; and a plurality of phishing-related data items related to a communications network of an organization, the plurality of phishing-related data items including at least one of: internal Internet Protocol addresses of the communications network, computerized devices of the communications network, users of particular computerized devices, organizational positions associated with users of particular computerized devices, or URLs and/or external domains visited by users of particular computerized devices; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute the plurality of computer executable instructions in order to cause the computer system to: access an email data item transmitted to one or more of the users of respective computerized devices within the network of the organization, the email data item including at least a subject and a sender, the email data item potentially associated with phishing activity; designate the accessed email data item as a seed; and generate a data item cluster based on the data clustering strategy by at least: adding the seed to the data item cluster; determining the subject and the sender associated with the seed; identifying one or more of the plurality of email data items having a same subject as the determined subject or a same sender as the determined sender; adding the identified one or more email data items to the data item cluster; parsing one or more URLs from the email data items of the data item cluster; adding the parsed URLs to the data item cluster; identifying one or more users who are both recipients of at least one of the email data items of the data item cluster and visitors of one of the URLs of the data item cluster; adding the identified one or more users, including data related to the one or more users, to the data item cluster; identifying additional one or more data items associated with any data items of the data item cluster; and adding, to the data item cluster, the additional one or more data items. 2. The computer system of claim 1 , wherein generating the data item cluster based on the data clustering strategy further comprises: determining any new subjects or new senders associated with email data items of the data item cluster that are different from the determined subjects or the determined senders; identifying a second one or more of the plurality of email data items having a same subject as the determined new subject, or a same sender as the determined new sender; and adding the identified second one or more email data items to the data item cluster. 3. The computer system of claim 1 , wherein the identified one or more email data items are added to the data item cluster only if received by one or more computerized devices within the network within a predetermined period of time from a time that the seed was received. 4. The computer system of claim 3 , wherein the period of time comprises at least one of a number of hours, a number of days, or a number of weeks. 5. The computer system of claim 3 , wherein the predetermined period of time is further determined based on other email data items in the data item cluster. 6. The computer system of claim 1 , wherein identifying the one or more users further comprises: scanning communications on the communications network of the organization so as to generate phishing-related data items including URLs visited by particular users; extracting recipients of the email data items of the data item cluster associated with respective parsed URLs; and for any parsed URL matching a URL visited by a particular user, if the extracted recipient of the email data item associated with the parsed URL matches the particular user, then identifying the user. 7. The computer system of claim 6 , wherein the communications are continuously scanned via a proxy. 8. The computer system of claim 1 , wherein the one or more hardware computer processors are further configured to execute the plurality of computer executable instructions in order to cause the one or more hardware computer processors to: continuously receive email data items from users of respective computing devices of the organization, designate the received email data items as seeds, and generate data items clusters based on the data clustering strategy. 9. The computer system of claim 1 , wherein the data related to the one or more users includes an organizational position associated with the user. 10. A computer system comprising: one or more computer readable storage devices configured to store: a plurality of computer executable instructions; a data clustering strategy; and a plurality of data items including at least: a plurality of email data items, each of the plurality of email data items including at least a subject and a sender, each of the plurality of email data items potentially associated with phishing activity; and a plurality of phishing-related data items related to customers of an organization, the plurality of phishing-related data items including indicators of at least one of: customers of the organization or URLs identified as malicious by a third-party service; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute the plurality of computer executable instructions in order to cause the computer system to: receive a plurality of email data items from customers of the organization, each of the email data items including at least a subject and a sender, each of the email data items potentially associated with phishing activity; designate each of the received email data items as seeds; and for each of the designated seeds, generate a data item cluster based on the data clustering strategy by at least: adding the seed to the data item cluster; determining the subject and the sender associated with the seed; accessing the one or more computer readable storage devices and identifying one or more of the plurality of email data items having a same subject as the determined subject or a same sender as the determined sender; adding the identified one or more email data items to the data item cluster; parsing one or more URLs from the email data items of the data item cluster; adding the URLs to the data item cluster; in response to determining that the data item cluster includes at least a predetermined threshold quantity of email data items, designating the data item cluster as a campaign cluster; identifying additional one or more data items associated with any data items of the data item cluster; and adding, to the data item cluster, the additional one or more data items. 11. The computer system of claim 10 , wherein the one or more hardware computer processors are further configured to execute the plurality of computer executable instructions in order to cause the one or more hardware computer processors to: for each campaign cluster, initiate further automated investigation including at least: comparing URLs included in the campaign cluster with URLs previously identified as malicious by a third-party service; based on the comparing, identify

Assignees

Inventors

Classifications

  • G06Q40/12Primary

    Accounting · CPC title

  • Clustering or classification · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • by monitoring network traffic (monitoring network traffic per se H04L43/00) · CPC title

  • the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11546364B2 cover?
Embodiments of the present disclosure relate to a data analysis system that may automatically generate memory-efficient clustered data structures, automatically analyze those clustered data structures, and provide results of the automated analysis in an optimized way to an analyst. The automated analysis of the clustered data structures (also referred to herein as data clusters) may include an …
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06Q40/12. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).