Sensitive data detection in communication data

US2020336501A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020336501-A1
Application numberUS-201916389557-A
CountryUS
Kind codeA1
Filing dateApr 19, 2019
Priority dateApr 19, 2019
Publication dateOct 22, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The described technologies leverage a trained evaluation function to analyze an email message to determine if a password is included in the text of the email message. The text of the email message may be vectorized using a character lookup table including vector values for each ASCII character. The trained evaluation function analyzes the vectorized text to determine if a password is included in the text of the mail message. An email message found to include a password may be placed in a quarantine storage to at least temporality prevent the email message from being disseminated to a recipient.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer implemented method, comprising: identifying communication data to analyze for sensitive data; evaluating the communication data to identify at least one data token included in the communication data, the at least one data token including a plurality of characters; converting each of the plurality of characters included in the at least one data token to a vector comprising vector values obtained from a character lookup table; generating a vector list including each vector representing each of the plurality of characters included in the at least one data token; and evaluating the vector list using a trained evaluation function to determine if the at least one data token comprises sensitive data, when the at least one data token comprises sensitive data, associating the communication data in a quarantine storage, when the at least one data token is devoid of sensitive data, allowing dissemination of the communication data. 2 . The computer implemented method according to claim 1 , wherein the communication data is associated with an email, text message, or other electronic message and the evaluating is to determine if the at least one data token comprises password data. 3 . The computer implemented method according to claim 1 , wherein trained evaluation function is based on a recursive machine learning model trained using a data set including known space delimited passwords and a data set including space delimited non-password characters. 4 . The computer implemented method according to claim 1 , wherein the trained evaluation function is based on a recursive machine learning model trained using vectorized characters generated from the character lookup table. 5 . The computer implemented method according to claim 4 , wherein the character lookup table comprises a plurality of characters, each character of the plurality of characters including a vector with associated values that represent one or more features or characteristics associated with the character. 6 . The computer implemented method according to claim 1 , wherein the character lookup table comprises the plurality of characters, each character the plurality of characters including a vector with associated values that represent one or more features or characteristics associated with the character. 7 . The computer implemented method according to claim 1 , wherein when the evaluating determines that the at least one data token comprises sensitive data, and prior to associating the communication data in a quarantine storage, evaluating a predetermined number of characters preceding and/or following the at least one data token to identify a context associated with the at least one data token, and determining the at least one data token comprises sensitive data when the context supports the determination that the at least one data token comprises sensitive data. 8 . A computing device, comprising: a processor; a computer-readable storage medium in communication with the processor, the computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by the processor, cause the processor to: receive an electronic message comprising a plurality of character sets, the character sets delimited by white space; convert each of the character sets of the plurality of character sets to a vector list to generate a plurality of vector lists, each vector list of the plurality of vector lists comprising a plurality of vectors each comprising vector values, the vector values obtained from a character lookup table, the character lookup table comprising characters that each include vector values that each represent at least one characteristic associated with a unique character; evaluate each vector list of the plurality of vector lists using a trained evaluation function to determine that at least one of the character sets includes password data; and cause a computer to generate and display a user interface including an indication that the electronic message comprises the password data. 9 . The computing device according to claim 8 , wherein the electronic message is an email message or a text message. 10 . The computing device according to claim 8 , wherein the computer-executable instructions, when executed by the processor, cause the processor to, subsequent to determining that at least one of the character sets includes password data, evaluate a predetermined number of characters preceding and/or following the character set including password data to identify a context associated with the character set including password data, and determine the character set includes the password data when the context supports the determination that the character set includes the password data using the trained evaluation function. 11 . The computing device according to claim 8 , wherein trained evaluation function is based on a recursive machine learning model trained using a data set including known space delimited passwords and a data set including space delimited non-password characters. 12 . The computing device according to claim 8 , wherein the trained evaluation function is based on a recursive machine learning model trained using vectorized characters generated from the character lookup table. 13 . A computing device, comprising: a processor; a computer-readable storage medium in communication with the processor, the computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by the processor, cause the processor to: identify communication data to analyze for one or more passwords; evaluate the communication data to identify at least one data token included in the communication data, the at least one data token including a plurality of characters; convert each of the plurality of characters included in the at least one data token to a vector comprising vector values obtained from a character lookup table; generate a vector list including each vector representing each of the plurality of characters included in the at least one data token; and evaluate the vector list using a trained evaluation function to determine if the at least one data token comprises a password, when the at least one data token comprises a password, inform a user that the communication data includes the password and prevent dissemination of the communication data, when the at least one data token is devoid of a password, allow dissemination of the communication data. 14 . The computing device according to claim 13 , wherein the communication data is associated with an email, text message, or other electronic message. 15 . The computing device according to claim 13 , wherein trained evaluation function is based on a recursive machine learning model trained using a data set including known space delimited passwords and a data set including space delimited non-password characters. 16 . The computing device according to claim 13 , wherein the trained evaluation function is based on a recursive machine learning model trained using vectorized characters generated from the character lookup table. 17 . The computing device according to claim 16 , wherein the character lookup table comprises a plurality of characters, each character of the plurality of characters including a vector with associated values that represent one or more characteristics associated with the character. 18 . The computing device according to claim 13 , wherein the character lookup table comprises th

Assignees

Inventors

Classifications

  • H04L51/212Primary

    using filtering or selective blocking · CPC title

  • Machine learning · CPC title

  • Vectors, bitmaps or matrices · CPC title

  • during internet communication, e.g. revealing personal data from cookies · CPC title

  • Event detection, e.g. attack signature detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020336501A1 cover?
The described technologies leverage a trained evaluation function to analyze an email message to determine if a password is included in the text of the email message. The text of the email message may be vectorized using a character lookup table including vector values for each ASCII character. The trained evaluation function analyzes the vectorized text to determine if a password is included i…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification H04L51/212. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Oct 22 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).