Cleartext password detection using machine learning

US10762192B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10762192-B2
Application numberUS-201816108750-A
CountryUS
Kind codeB2
Filing dateAug 22, 2018
Priority dateAug 22, 2018
Publication dateSep 1, 2020
Grant dateSep 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Cleartext passwords represent a security risk. An unencrypted password can be exploited to gain access to a system and/or perform unauthorized functions. This disclosure describes how to detect cleartext passwords in a generalized manner using predictive text classifiers (e.g. Word2Vec). Using a corpus of text, an artificial intelligence model can be built by training a predictive text classifier to identify password anomalies (e.g., areas of text that occur with low statistical probability). Source program code, configuration files, log files, and other types of text can be automatically scanned for cleartext passwords without having to rely on password lists or other limited and/or labor intensive mechanisms, thus improving system security and reducing the chances of data exfiltration and unauthorized actions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: a computer system accessing uncategorized textual data; the computer system feeding the uncategorized textual data to a predictive text classifier, the predictive text classifier built using a corpus of sample textual data; determining, at the computer system by the predictive text classifier, whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using an initial group of one or more characters of the uncategorized textual data to predict a subsequent group of one or more characters that follow the initial group; and outputting, by the computer system, result data indicating whether the uncategorized textual data was determined to have one or more password anomalies. 2. The method of claim 1 , wherein determining whether the uncategorized textual data has one or more password anomalies comprises: determining a predictive probability, based on the predictive text classifier, of the subsequent group of characters following the initial group of characters; and determining that a password anomaly exists if the predictive probability is below a particular threshold probability. 3. The method of claim 1 , wherein the subsequent group of characters immediately follow the initial group of characters. 4. The method of claim 1 , wherein the corpus of sample textual data comprises a plurality of files including text data that are of a same type, and wherein the uncategorized textual data is of the same type. 5. The method of claim 4 , wherein the corpus of sample textual data includes programming source code in a particular language, and wherein the uncategorized textual data also includes programming source code in the particular language. 6. The method of claim 1 , wherein the operations further comprise: identifying, within the uncategorized textual data, one or more locations within the uncategorized textual data at which one or more password anomalies are present; and providing location information indicating the one or more locations. 7. The method of claim 1 , further comprising: automatically determining, by the computer system, whether the uncategorized textual data contains textual data of a same type as textual data in the corpus. 8. The method of claim 1 , wherein the predictive text classifier is a Word2Vec classifier. 9. The method of claim 1 , wherein the subsequent group of characters comprises at least four characters. 10. The method of claim 1 , wherein accessing the uncategorized textual data comprises accessing a file and extracting the uncategorized textual data from one or more first portions of the file, while not extracting data from one or more other portions of the file. 11. A non-transitory computer-readable medium having stored thereon instructions that are executable by a computer system to cause the computer system to perform operations comprising: accessing uncategorized textual data; feeding the uncategorized textual data to a predictive text classifier, the predictive text classifier built using a corpus of sample textual data; determining whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using an initial group of one or more characters of the uncategorized textual data to predict a subsequent group of one or more characters that follow the initial group; and outputting result data indicating whether the uncategorized textual data was determined to have one or more password anomalies. 12. The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: tokenizing the uncategorized textual data, based on one or more provided parameters, to produce a plurality of tokenized character sequences. 13. The non-transitory computer-readable medium of claim 12 , wherein the predictive text classifier analyzes each of the plurality of tokenized character sequences to determine whether that tokenized character sequence includes a password anomaly. 14. The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: identifying, within the uncategorized textual data, one or more locations within the uncategorized textual data at which one or more password anomalies are present; and providing location information indicating the one or more locations. 15. The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: determining whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using a further subsequent group of one or more characters of the uncategorized textual data to predict the subsequent group of one or more characters that follow the initial group. 16. The non-transitory computer-readable medium of claim 15 , wherein the further subsequent group of characters do not immediately follow the subsequent group of characters. 17. A computer system, comprising: a processor; a display; and a non-transitory computer-readable medium having stored thereon instructions that are executable by the processor to cause the computer system to perform operations comprising: accessing uncategorized textual data; feeding the uncategorized textual data to a predictive text classifier, the predictive text classifier built using a corpus of sample textual data; determining whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using an initial group of one or more characters of the uncategorized textual data to predict a subsequent group of one or more characters that follow the initial group; and outputting result data indicating whether the uncategorized textual data was determined to have one or more password anomalies. 18. The computer system of claim 17 , wherein the operations further comprise: determining a predictive probability, based on the predictive text classifier, of the subsequent group of characters following the initial group of characters; and determining that a password anomaly exists if the predictive probability is below a particular threshold probability. 19. The computer system of claim 17 , wherein the corpus of sample textual data comprises a plurality of files including text data that are of a same type, and wherein the uncategorized textual data is of the same type. 20. The computer system of claim 17 , wherein the operations further comprise: identifying, within the uncategorized textual data, one or more locations within the uncategorized textual data at which one or more password anomalies are present; and providing location information indicating the one or more locations.

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • G06F21/46Primary

    by designing passwords or checking the strength of passwords · CPC title

  • Assessing vulnerabilities and evaluating computer system security · CPC title

  • using vector based model · CPC title

  • Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10762192B2 cover?
Cleartext passwords represent a security risk. An unencrypted password can be exploited to gain access to a system and/or perform unauthorized functions. This disclosure describes how to detect cleartext passwords in a generalized manner using predictive text classifiers (e.g. Word2Vec). Using a corpus of text, an artificial intelligence model can be built by training a predictive text classifi…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).