Systems and methods for classifying files as specific types of malware
US-10489587-B1 · Nov 26, 2019 · US
US11521108B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11521108-B2 |
| Application number | US-201816049579-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 30, 2018 |
| Priority date | Jul 30, 2018 |
| Publication date | Dec 6, 2022 |
| Grant date | Dec 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Emails or other communications are labeled with a category label such as “spam” or “good” without using confidential or Personally Identifiable Information (PII). The category label is based on features of the emails such as metadata that do not contain PII. Graphs of inferred relationships between email features and category labels are used to assign labels to emails and to features of the emails. The labeled emails are used as a training dataset for training a machine learning model (“MLM”). The MLM model identifies unwanted emails such as spam, bulk email, phishing email, and emails that contain malware.
Opening claim text (preview).
The invention claimed is: 1. A system comprising: one or more processing units; one or more memory units coupled to the one or more processing units; and an expansion graph, stored in the one or more memory units, that comprises: a principal type entity that is an unlabeled communication; a first clustering type entity that represents a first feature of the unlabeled communication other than personally identifiable information (PII) and, wherein the first clustering type entity is labeled with a first communications-category label; a second clustering type entity that represents a second feature of the unlabeled communication other than PII, wherein the second clustering type entity is labeled with a second communications-category label; a third clustering type entity that represents a third feature of the unlabeled communication other than PII; a first directional, derivative edge from the first clustering type entity to the principal type entity; a second directional, derivative edge from the second clustering type entity to the principal type entity; a directional, clustering edge from the principal type entity to the third clustering type entity; and a labeling module, stored in the one or more memory units, that is configured to: assign the first communications-category label from the first clustering type entity to the unlabeled communication based on the first directional, derivative edge from the first clustering type entity to the principal type entity and assign the second communications-category label from the second clustering type entity to the unlabeled communication based on the second directional, derivative edge from the second clustering type entity to the principal type entity, thereby creating a labeled communication, and assign at least one of the first communications-category label or the second communications-category label from the principal type entity to the third clustering type entity based on the directional, clustering edge from the principal type entity to the second clustering type entity. 2. The system of claim 1 , further comprising processing the labeled communication by storing the labeled communication or deleting the labeled communication based on the first communications-category label. 3. The system of claim 1 , further comprising an expansion module, stored in the one or more memory units, configured to assign the first communications-category label to the third clustering type entity based on one or more of the directional, clustering edges in the expansion graph. 4. The system of claim 1 , further comprising a confidence module, stored in the one or more memory units, configured to assign a probability to the first communications-category label based on the expansion graph. 5. The system of claim 1 , further comprising a voting module, stored in the one or more memory units, configured to apply a set of voting rules to resolve conflicts between the first communications-category label of the first feature and the second communications-category label of the third feature. 6. The system of claim 1 , further comprising a composite key module, stored in the one or more memory units, configured to generate a cluster based on two or more of the first feature, the second feature, or the third feature. 7. The system of claim 1 , wherein the second clustering type entity contains other unlabeled communications that cluster together based on the second feature. 8. The system of claim 1 , further comprising a voting module, stored in the one or more memory units, configured to select a single label for the second clustering type entity based on votes that include the first communications-category label assigned from the principal type entity and at least two other communications-category labels assigned from different principal type entities. 9. The system of claim 1 , wherein the expansion graph further comprises: a fourth clustering type entity that represents a fourth feature of the unlabeled communication other than PII; a directional, clustering edge from the principal type entity to the fourth clustering type entity; a directional, clustering edge from the first clustering type entity to the fourth clustering type entity; and a directional, clustering edge from the second clustering type entity to the fourth clustering type entity. 10. The system of claim 1 , wherein the expansion graph further comprises: a directional, clustering edge from the principal type entity to the first clustering type entity; a directional, clustering edge from the principal type entity to the second clustering type entity; a directional, clustering edge from the first clustering type entity to the third clustering type entity. 11. A method comprising; accessing an expansion graph of relationships between a message node representing an unlabeled message and a plurality of feature nodes, wherein the expansion graph is specific to a communications-category label and wherein the plurality of feature nodes comprise at least two of a message hash node, a message sender node, a URL node, or a sender host node; extracting a feature from the unlabeled message; correlating the feature with a one of the plurality of feature nodes in the expansion graph, wherein the one of the plurality of feature nodes has a first category label; assigning the first category label to the unlabeled message based on a directional, derivative edge in the expansion graph from the feature node to the message node thereby creating a labeled message, wherein the directional, derivative edge is associated with a probability and assigning the first category label is based on the probability; assigning a second category label to the unlabeled message based on a second expansion graph and a second feature of the unlabeled message; applying a set of voting rules to resolve a conflict between the first category label and the second category label; creating a training dataset comprising the labeled message; generating a machine learning model by supervised learning using the training dataset; and classifying a new message with the machine learning model. 12. The method of claim 11 , wherein the expansion graph is a logical layer that captures clustering and label expansion logic between multiple different types of entities that are clustered. 13. The method of claim 11 , wherein the first category label comprises one or more of good message, spam message, phishing message, bulk message, or malware message and further comprising: processing the new message according to the first category label, the processing comprising storing, quarantining, or deleting. 14. The method of claim 11 , wherein the expansion graph comprises a directional, clustering edge from the message node to a second feature node, wherein the second feature node receives the first category label from the message node. 15. Computer-readable storage media comprising instructions that when executed cause a computing device to: access an expansion graph of relationships between a message node representing an unlabeled message and a plurality of feature nodes, wherein the expansion graph is specific to a communications-category label and wherein the plurality of feature nodes comprise at least two of a message hash node, a message sender node, a URL node, or a sender host node; extract a feature from the unlabeled message; correlate the feature with a one of the plurality of feature nodes in the expansion graph, wherein the one of the plurality of feature nodes has a first category label; assign the first category label to the unlabeled message b
Mailbox-related aspects, e.g. synchronisation of mailboxes · CPC title
Computer-aided management of electronic mailing [e-mailing] · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.