Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Aug 27 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Spam classification system based on network flow data

US10397256B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10397256-B2
Application number	US-201615365008-A
Country	US
Kind code	B2
Filing date	Nov 30, 2016
Priority date	Jun 13, 2016
Publication date	Aug 27, 2019
Grant date	Aug 27, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example embodiment, a computer-implemented method comprises obtaining labels from messages associated with an email service provider, wherein the labels indicate for each message IP how many spam and non-spam messages have been received; obtaining network data features from a cloud service provider; providing the labels and network data features to a machine learning application; generating a prediction model representing an algorithm for determining whether a particular set of network data features are spam or not; applying the prediction model to network data features for an unlabeled message; and generating an output of the prediction model indicating a likelihood that the unlabeled message is spam.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for sharing data between at least an email service provider and a cloud service provider in order to identify network spamming message patterns without accessing spamming message content, the method comprising: obtaining labels from messages associated with an email service provider, wherein the labels indicate for each message IP address how many spam and non-spam messages have been received; obtaining network data features from a cloud service provider; providing the labels and the network data features to a machine learning application, wherein the machine learning application identifies correlations between IP addresses associated with the labels and IP addresses associated with the network data features, the correlations being used to facilitate the machine learning application in generating a prediction model to detect spamming hosts that generate spamming messages; generating the prediction model representing an algorithm for determining whether a particular set of network data features are spam or not; and after an unlabeled message, which has not yet been characterized as spam or not as spam, is generated by a computing device of the cloud service provider and after the unlabeled message is received at a router of the cloud service provider in preparation for transmittal to a recipient computing device, applying the prediction model to the unlabeled message to determine whether the unlabeled message is spam or is not spam, wherein the network data features from the cloud service provider include descriptors of connections between the computing device that generated the unlabeled message and the recipient computing device, the descriptors including information describing a source and destination IP address, source and destination ports, a protocol type, and a union of TCP flags. 2. The computer-implemented method of claim 1 , further comprising: generating an output of the prediction model indicating a likelihood that the unlabeled message is spam. 3. The computer-implemented method of claim 1 , further comprising: obtain an updated set of labels from messages associated with the email service provider; and retrain the prediction models based upon the updated set of labels. 4. The computer-implemented method of claim 1 , wherein a virtual machine residing on the computing device generated the unlabeled message, and wherein the method further comprises: when the unlabeled message is identified as being spam, labeling the virtual machine as spamming. 5. The computer-implemented method of claim 1 , wherein the machine learning application is a trained learner having a classification algorithm that is used to predict spam from a sparse matrix created from the network data features. 6. The computer-implemented method of claim 1 , wherein the network data features correspond to IPFIX data. 7. The computer-implemented method of claim 1 , wherein the network data features comprise email metadata. 8. The computer-implemented method of claim 1 , wherein the labels from messages associated with an email service provider are stored as a reputation dataset. 9. A machine-learning server comprising: one or more processor(s); and one or more computer-readable hardware storage device(s) having stored thereon computer-executable instructions that are executable by the one or more processor(s) to cause the machine-learning server to: obtain labels from messages associated with an email service provider, wherein the labels indicate for each message IP address how many spam and non-spam messages have been received; obtain network data features from a cloud service provider; provide the labels and the network data features to a machine learning application, wherein the machine learning application identifies correlations between IP addresses associated with the labels and IP addresses associated with the network data features, the correlations being used to facilitate the machine learning application in generating a prediction model to detect spamming hosts that generate spamming messages; generate the prediction model representing an algorithm for determining whether a particular set of network data features are spam or not; and after an unlabeled message, which has not yet been characterized as spam or not as spam, is generated by a computing device of the cloud service provider and after the unlabeled message is received at a router of the cloud service provider in preparation for transmittal to a recipient computing device, apply the prediction model to the unlabeled message to determine whether the unlabeled message is spam or is not spam, wherein the network data features from the cloud service provider include descriptors of connections between the computing device that generated the unlabeled message and the recipient computing device, the descriptors including information describing a source and destination IP address, source and destination ports, a protocol type, and a union of TCP flags. 10. The machine-learning server of claim 9 , wherein execution of the computer-executable instructions further causes the machine-learning server to: generate an output of the prediction model indicating a likelihood that the unlabeled message is spam. 11. The machine-learning server of claim 9 , wherein execution of the computer-executable instructions further causes the machine-learning server to: obtain an updated set of labels from messages associated with the email service provider; and retrain the prediction models based upon the updated set of labels. 12. The machine-learning server of claim 9 , wherein execution of the computer-executable instructions further causes the machine-learning server to: forward the prediction model to a cloud management application for use in identifying spamming machines on a cloud service. 13. The machine-learning server of claim 9 , wherein the machine learning application is a trained learner having a classification algorithm that is used to predict spam from a sparse matrix created from the network data features. 14. The machine-learning server of claim 9 , wherein the network data features correspond to IPFIX data. 15. The machine-learning server of claim 9 , wherein the network data features comprise email metadata. 16. The machine-learning server of claim 9 , wherein the labels from the messages associated with the email service provider are stored as a reputation dataset. 17. The machine-learning server of claim 9 , wherein the descriptors are included as flow-based metadata. 18. The machine-learning server of claim 9 , wherein the descriptors are included as flow-based metadata, and wherein execution of the computer-executable instructions further causes the machine-learning server to: condense the flow-based metadata into flow records that capture data about the messages. 19. A computer-implemented method for sharing data between different services to identify network spamming patterns, the method comprising: receiving a prediction model representing an algorithm for determining whether a particular set of network data features are spam or not, wherein: the prediction model is generated from labels from messages associated with an email service provider and from network data features from a cloud service provider, the prediction model is generated by a machine learning application that identifies correlations between IP addresses associated with the labels and IP addresses associated with the network data features, and the co

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
H04L67/10
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
H04L63/1425Primary
Traffic logging, e.g. anomaly detection · CPC title
G06N20/00Primary
Machine learning · CPC title
G06N7/005
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 60574262

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10397256B2 cover?: In an example embodiment, a computer-implemented method comprises obtaining labels from messages associated with an email service provider, wherein the labels indicate for each message IP how many spam and non-spam messages have been received; obtaining network data features from a cloud service provider; providing the labels and network data features to a machine learning application; generati…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Aug 27 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

A method and system for network access control based on traffic monitoring and vulnerability detection using process related information

Classifier Bypass Based On Message Sender Trust and Verification

Framework for explaining anomalies in accessing web applications

Method and Apparatus for Malware Detection

System and Method for Cloud Based IP Mobile Messaging Spam Detection and Defense

Systems and methods for dynamic cloud-based malware behavior analysis

Spam notification device

Frequently asked questions