Privacy-preserving labeling and classification of email

US11521108B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11521108-B2
Application numberUS-201816049579-A
CountryUS
Kind codeB2
Filing dateJul 30, 2018
Priority dateJul 30, 2018
Publication dateDec 6, 2022
Grant dateDec 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Emails or other communications are labeled with a category label such as “spam” or “good” without using confidential or Personally Identifiable Information (PII). The category label is based on features of the emails such as metadata that do not contain PII. Graphs of inferred relationships between email features and category labels are used to assign labels to emails and to features of the emails. The labeled emails are used as a training dataset for training a machine learning model (“MLM”). The MLM model identifies unwanted emails such as spam, bulk email, phishing email, and emails that contain malware.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system comprising: one or more processing units; one or more memory units coupled to the one or more processing units; and an expansion graph, stored in the one or more memory units, that comprises: a principal type entity that is an unlabeled communication; a first clustering type entity that represents a first feature of the unlabeled communication other than personally identifiable information (PII) and, wherein the first clustering type entity is labeled with a first communications-category label; a second clustering type entity that represents a second feature of the unlabeled communication other than PII, wherein the second clustering type entity is labeled with a second communications-category label; a third clustering type entity that represents a third feature of the unlabeled communication other than PII; a first directional, derivative edge from the first clustering type entity to the principal type entity; a second directional, derivative edge from the second clustering type entity to the principal type entity; a directional, clustering edge from the principal type entity to the third clustering type entity; and a labeling module, stored in the one or more memory units, that is configured to: assign the first communications-category label from the first clustering type entity to the unlabeled communication based on the first directional, derivative edge from the first clustering type entity to the principal type entity and assign the second communications-category label from the second clustering type entity to the unlabeled communication based on the second directional, derivative edge from the second clustering type entity to the principal type entity, thereby creating a labeled communication, and assign at least one of the first communications-category label or the second communications-category label from the principal type entity to the third clustering type entity based on the directional, clustering edge from the principal type entity to the second clustering type entity. 2. The system of claim 1 , further comprising processing the labeled communication by storing the labeled communication or deleting the labeled communication based on the first communications-category label. 3. The system of claim 1 , further comprising an expansion module, stored in the one or more memory units, configured to assign the first communications-category label to the third clustering type entity based on one or more of the directional, clustering edges in the expansion graph. 4. The system of claim 1 , further comprising a confidence module, stored in the one or more memory units, configured to assign a probability to the first communications-category label based on the expansion graph. 5. The system of claim 1 , further comprising a voting module, stored in the one or more memory units, configured to apply a set of voting rules to resolve conflicts between the first communications-category label of the first feature and the second communications-category label of the third feature. 6. The system of claim 1 , further comprising a composite key module, stored in the one or more memory units, configured to generate a cluster based on two or more of the first feature, the second feature, or the third feature. 7. The system of claim 1 , wherein the second clustering type entity contains other unlabeled communications that cluster together based on the second feature. 8. The system of claim 1 , further comprising a voting module, stored in the one or more memory units, configured to select a single label for the second clustering type entity based on votes that include the first communications-category label assigned from the principal type entity and at least two other communications-category labels assigned from different principal type entities. 9. The system of claim 1 , wherein the expansion graph further comprises: a fourth clustering type entity that represents a fourth feature of the unlabeled communication other than PII; a directional, clustering edge from the principal type entity to the fourth clustering type entity; a directional, clustering edge from the first clustering type entity to the fourth clustering type entity; and a directional, clustering edge from the second clustering type entity to the fourth clustering type entity. 10. The system of claim 1 , wherein the expansion graph further comprises: a directional, clustering edge from the principal type entity to the first clustering type entity; a directional, clustering edge from the principal type entity to the second clustering type entity; a directional, clustering edge from the first clustering type entity to the third clustering type entity. 11. A method comprising; accessing an expansion graph of relationships between a message node representing an unlabeled message and a plurality of feature nodes, wherein the expansion graph is specific to a communications-category label and wherein the plurality of feature nodes comprise at least two of a message hash node, a message sender node, a URL node, or a sender host node; extracting a feature from the unlabeled message; correlating the feature with a one of the plurality of feature nodes in the expansion graph, wherein the one of the plurality of feature nodes has a first category label; assigning the first category label to the unlabeled message based on a directional, derivative edge in the expansion graph from the feature node to the message node thereby creating a labeled message, wherein the directional, derivative edge is associated with a probability and assigning the first category label is based on the probability; assigning a second category label to the unlabeled message based on a second expansion graph and a second feature of the unlabeled message; applying a set of voting rules to resolve a conflict between the first category label and the second category label; creating a training dataset comprising the labeled message; generating a machine learning model by supervised learning using the training dataset; and classifying a new message with the machine learning model. 12. The method of claim 11 , wherein the expansion graph is a logical layer that captures clustering and label expansion logic between multiple different types of entities that are clustered. 13. The method of claim 11 , wherein the first category label comprises one or more of good message, spam message, phishing message, bulk message, or malware message and further comprising: processing the new message according to the first category label, the processing comprising storing, quarantining, or deleting. 14. The method of claim 11 , wherein the expansion graph comprises a directional, clustering edge from the message node to a second feature node, wherein the second feature node receives the first category label from the message node. 15. Computer-readable storage media comprising instructions that when executed cause a computing device to: access an expansion graph of relationships between a message node representing an unlabeled message and a plurality of feature nodes, wherein the expansion graph is specific to a communications-category label and wherein the plurality of feature nodes comprise at least two of a message hash node, a message sender node, a URL node, or a sender host node; extract a feature from the unlabeled message; correlate the feature with a one of the plurality of feature nodes in the expansion graph, wherein the one of the plurality of feature nodes has a first category label; assign the first category label to the unlabeled message b

Assignees

Inventors

Classifications

  • Mailbox-related aspects, e.g. synchronisation of mailboxes · CPC title

  • G06Q10/107Primary

    Computer-aided management of electronic mailing [e-mailing] · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11521108B2 cover?
Emails or other communications are labeled with a category label such as “spam” or “good” without using confidential or Personally Identifiable Information (PII). The category label is based on features of the emails such as metadata that do not contain PII. Graphs of inferred relationships between email features and category labels are used to assign labels to emails and to features of the ema…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06Q10/107. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).