Analyzing distributed datasets

US9960975B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9960975-B1
Application numberUS-201414533905-A
CountryUS
Kind codeB1
Filing dateNov 5, 2014
Priority dateNov 5, 2014
Publication dateMay 1, 2018
Grant dateMay 1, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for analyzing a dataset may be provided. For example, a configuration file may be accessed. The dataset may be analyzed based on a condition identified in the configuration file. A report may be generated and transmitted based on the analysis. Another report generated based on an analysis of another dataset according to another configuration file may be accessed. The dataset may be further analyzed based on this report to determine if a reported observation may also be associated with the dataset. If so, a confirmation may be generated and transmitted.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: distributing, by a computer system, a configuration file to a plurality of nodes forming a point-to-point network, the plurality of nodes comprising a plurality of corresponding local datasets and a plurality of corresponding local agents, the plurality of corresponding local agents configured to analyze the plurality of corresponding local datasets based at least in part on the configuration file, the configuration file identifying data fields of a dataset to be analyzed according to a condition and an action to be performed if the condition is met, the computer system maintaining a privileged node of the plurality of nodes, the configuration file indicating that confirmations are to be protected with a public key associated with the privileged node and independently indicating whether reports are to be protected with the public key; receiving, by the computer system, a report from a node of the plurality of nodes, the report based at least in part on an analysis of the configuration file by a corresponding local agent of the node, the report indicating that analyzed data fields from a corresponding local dataset of the node meet the condition, the report comprising a summary of the analyzed data fields, the report received based at least in part on the action, the report protected using the public key in accordance with the indication in the configuration file; receiving, by the computer system, a confirmation from each of a set of nodes of the plurality of nodes, the confirmation indicating that a local dataset of the node comprises data associated with the summary, the confirmation received based at least in part on the action, individual confirmations protected using the public key in accordance with the indication in the configuration file such that non-privileged nodes of the plurality of nodes are unable to access protected content of the confirmation; comparing, by the computer system, a count of the set of nodes from which confirmations were received to a threshold; and determining, by the computer system, that the node is associated with a network-based issue based at least in part on the count being lower than the threshold. 2. The computer-implemented method of claim 1 , wherein the plurality of corresponding local agents comprise a plurality of autonomous agents, wherein a first autonomous agent of a first node analyzes a first local dataset of the first node based at least in part on the configuration file and independently of an analysis of a second local dataset of a second node by a second autonomous agent of the second node. 3. The computer-implemented method of claim 1 , wherein the plurality of nodes comprise web servers, wherein the plurality of corresponding local datasets comprise local logs associated with client access to the web servers, wherein the data fields identified in the configuration file comprise Internet protocol (IP) address fields in the local logs, wherein the condition comprises a frequency of IP addresses, and wherein the action comprises generating a locality-sensitive hash of traffic associated with the IP addresses and transmitting the locality-sensitive hash to the remaining nodes. 4. The computer-implemented method of claim 1 , wherein the node comprises a web server, and wherein determining that the node is associated with a network-based issue comprises determining that the web server is under a network attack based at least in part on determining that the count is lower than the threshold. 5. One or more non-transitory computer-readable storage media storing computer-executable instructions that, when executed by a computing node configure the computing node to perform operations comprising: accessing a configuration file comprising a condition, the configuration file comprising an indication that reports are to be protected with a public key of a privileged computing node communicatively coupled with the computing node; determining that a portion of a local dataset meets the condition; generating a report based at least in part on meeting the condition, the report protected using the public key of the privileged computing node in accordance with the indication in the configuration file; accessing a second configuration file and a second report that is associated with a second computing node, the second report generated based at least in part on the second configuration file and comprising an observation associated with a second local dataset of the second computing node, the observation indicating that a second portion of the second local dataset meets a second condition from the second configuration file, the second configuration file comprising a second indication that messages are to be protected with the public key of the privileged computing node; analyzing the local dataset to determine whether the observation associated with the second local dataset is also associated with the local dataset; and generating a message indicative of an association between the local dataset and the observation based at least in part on the analyzing, the message protected using the public key of the privileged computing node in accordance with the second indication in the second report. 6. The one or more computer-readable storage media of claim 5 , wherein the configuration file comprises a heuristic field and an interpretation field. 7. The one or more computer-readable storage media of claim 6 , wherein the interpretation field is used to determine whether the dataset includes a particular field, and wherein the heuristic field is used to determine whether an included particular field meets the condition. 8. The one or more computer-readable storage media of claim 5 , wherein the report comprises identification configured to enable the computing node to identify the configuration file. 9. The one or more computer-readable storage media of claim 5 , wherein the message comprises identification configured to enable the computing node to identify the second configuration file. 10. The one or more computer-readable storage media of claim 5 , wherein the report is unprotected with a public key of a privileged node configured to distribute configuration files, and wherein the message is protected with the public key of the privileged node. 11. The one or more computer-readable storage media of claim 5 , wherein analyzing the local dataset to determine whether the observation is associated with the local dataset comprises determining a degree of similarity between data in the local dataset and the observation and declaring that the observation is associated with the local dataset based at least in part on a comparison of the degree of similarity to a threshold. 12. The one or more computer-readable storage media of claim 5 , wherein the computer-executable instructions that, when executed by the computing node, further configure the computing node to perform operations comprising: tracking a number of messages generated based at least in part on the configuration file; and determining that a network-based issue exists based at least in part on the number of messages. 13. The one or more computer-readable storage media of claim 12 , wherein the message comprises a cumulative count of computing nodes comprising local datasets associated with the observation. 14. A system, comprising: a memory configured to store computer-executable instructions; and a processor configured to access the memory and execute the computer-executable instructions to collectively at least: transmit, to a first system and a second system, a configuration file compr

Assignees

Inventors

Classifications

  • Event detection, e.g. attack signature detection · CPC title

  • Processing captured monitoring data, e.g. for logfile generation · CPC title

  • H04L43/06Primary

    Generation of reports · CPC title

  • Additional information in the notification, e.g. enhancement of specific meta-data · CPC title

  • based on generic templates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9960975B1 cover?
Techniques for analyzing a dataset may be provided. For example, a configuration file may be accessed. The dataset may be analyzed based on a condition identified in the configuration file. A report may be generated and transmitted based on the analysis. Another report generated based on an analysis of another dataset according to another configuration file may be accessed. The dataset may be f…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).