Clustering data based on indications of financial malfeasance

US9230280B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9230280-B1
Application numberUS-201414278963-A
CountryUS
Kind codeB1
Filing dateMay 15, 2014
Priority dateMar 15, 2013
Publication dateJan 5, 2016
Grant dateJan 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed to assist in detection of financial malfeasance. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data (such as trades, emails or chat messages) and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster, and the clusters may be displayed and ranked based on their scores. Various embodiments may enable an analyst to review clusters of trades, emails and/or chat messages that are the most likely to reveal financial malfeasance.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system to assist a human analyst in analyzing large amounts of electronic communications for malfeasance, comprising: one or more computer readable storage devices configured to store: one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module and a workflow engine module; a plurality of clustering strategies; a plurality of transaction risk indicators; a plurality of communication risk indicators; and at least one scoring criterion; one or more cluster data sources configure to store: a plurality of transaction data items and properties associated with respective transaction data items, each of the properties including associated property values; a plurality of email data items; a plurality of person data items; and a plurality of recipient data items; and one or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the one or more hardware computer processors to: designate, by the cluster engine module, one or more seeds by: accessing, from the one or more computer readable storage devices, the plurality of transaction risk indicators and at least one transaction data item of the plurality of transaction data items; comparing the plurality of transaction risk indicators to the at least one transaction data item and associated properties; and based at least on the comparison and in response to determining the at least one transaction data item is related to at least one transaction risk indicator, designating the at least one transaction data item as a first seed; determining a subset of email data items from the plurality of email data items that are identifiable as likely side conversations, wherein determining the subset of email data items comprises identifying an email data item that has at least one less of a particular participant than a previous email associated with the email data item; searching the subset of email data items to identify an initial email data item, distinct from the at least one transaction data item, based at least on a communication risk indicator of the plurality of communication risk indicators and a sender or recipient of the initial email data item corresponding to a person associated with the at least one transaction data item; and designating the initial email data item as a second seed; for each designated first and second seed: identify, by the cluster engine module, one or more first data items determined to be associated with the first seed based at least in part on a first clustering strategy of the plurality of clustering strategies, wherein the first clustering strategy queries the one or more cluster data sources to determine at least one of: a person data item from the plurality of person data items associated with the first seed of the at least one transaction data item, or an email data item of the plurality of email data items associated with the person data item; identify, by the cluster engine module, one or more second data items determined to be associated with the second seed based at least in part on a second clustering strategy of the plurality of clustering strategies, wherein the second clustering strategy queries the one or more cluster data sources to determine at least one of: a recipient data item from the plurality of recipient data items associated with the second seed of the initial email data item, a person data item from the plurality of person data items associated with at least one of the recipient data item or the sender of the initial email data item, or a transaction data item of the plurality of transaction data items associated with the person data item; generate, by the cluster engine module, a cluster based at least on the first and second seed, wherein generating the cluster comprises: adding the first and second seed to the cluster; adding the one or more first data items to the cluster; adding the one or more second data items to the cluster; storing the generated cluster in the one or more computer readable storage devices; and determine, by the cluster engine module, a score for the generated cluster, wherein determining the score for the generated cluster comprises: accessing, from the one or more computer readable storage devices, the at least one scoring criterion; and generating a cluster score for the generated cluster by assessing the generated cluster based at least on the accessed at least one scoring criterion; and cause presentation, by the workflow engine module, of at least one generated cluster and the determined score for the at least one generated cluster in a user interface of a client computing device. 2. The computer system of claim 1 , wherein at least some of the plurality of transaction data items represent trades and the first seed is a seed trade. 3. The computer system of claim 2 , wherein the first clustering strategy further queries the one or more cluster data sources to determine data items matching one or more properties values of the first seed. 4. The computer system of claim 3 , wherein, the first cluster strategy matches at least two property values selected from the group of: a time the seed trade was executed; a trader executing the seed trade; or a traded financial product of the seed trade. 5. The computer system of claim 3 , wherein the at least one scoring criterion comprises a first scoring criterion, and the first scoring criterion scores the cluster based at least on a volume of email data items added to the cluster that match a first common property value of the seed trade, wherein the first common property value of the seed trade comprises a member selected from the group of: a time the seed trade was executed; a trader executing the seed trade; or a traded financial product of the seed trade. 6. The computer system of claim 3 , wherein the at least one scoring criterion comprises a first scoring criterion, and the first scoring criterion scores the cluster based at least on the seed trade matching a prohibited time for trading by a trader executing the seed trade. 7. The computer system of claim 1 , wherein the at least one scoring criterion comprises a first scoring criterion, and the first scoring criterion scores the cluster based at least on a volume of email data items added to the cluster. 8. The computer system of claim 1 , wherein the one or more hardware computer processors in communication with the one or more computer readable storage devices are further configured to execute the one or more software modules in order to cause the computer system to: compile each cluster into a collection of scored clusters sorted by the cluster scores for respective scored cluster; and provide at least a portion of the collection of scored clusters for display in the user interface, wherein the user interface allows for inspection of each cluster by a user. 9. The computer system of claim 1 , wherein the at least one scoring criterion measures indications of possible financial malfeasance of financial traders associated with data items in the at least one generated cluster. 10. A computer system to assist a human analyst in analyzing large amounts of electronic communications for malfeasance, comprising: one or more computer readable storage devices configured to store: one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module and a workflow engine module; a plurality of clustering

Assignees

Inventors

Classifications

  • G06Q40/00Primary

    Finance; Insurance; Tax strategies; Processing of corporate or income taxes · CPC title

  • Product, service or business identity fraud · CPC title

  • Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9230280B1 cover?
In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed to assist in detection of financial malfeasance. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules,…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06Q40/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).