Data structures for categorizing and filtering content
US-2017193108-A1 · Jul 6, 2017 · US
US10762192B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10762192-B2 |
| Application number | US-201816108750-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 22, 2018 |
| Priority date | Aug 22, 2018 |
| Publication date | Sep 1, 2020 |
| Grant date | Sep 1, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Cleartext passwords represent a security risk. An unencrypted password can be exploited to gain access to a system and/or perform unauthorized functions. This disclosure describes how to detect cleartext passwords in a generalized manner using predictive text classifiers (e.g. Word2Vec). Using a corpus of text, an artificial intelligence model can be built by training a predictive text classifier to identify password anomalies (e.g., areas of text that occur with low statistical probability). Source program code, configuration files, log files, and other types of text can be automatically scanned for cleartext passwords without having to rely on password lists or other limited and/or labor intensive mechanisms, thus improving system security and reducing the chances of data exfiltration and unauthorized actions.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: a computer system accessing uncategorized textual data; the computer system feeding the uncategorized textual data to a predictive text classifier, the predictive text classifier built using a corpus of sample textual data; determining, at the computer system by the predictive text classifier, whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using an initial group of one or more characters of the uncategorized textual data to predict a subsequent group of one or more characters that follow the initial group; and outputting, by the computer system, result data indicating whether the uncategorized textual data was determined to have one or more password anomalies. 2. The method of claim 1 , wherein determining whether the uncategorized textual data has one or more password anomalies comprises: determining a predictive probability, based on the predictive text classifier, of the subsequent group of characters following the initial group of characters; and determining that a password anomaly exists if the predictive probability is below a particular threshold probability. 3. The method of claim 1 , wherein the subsequent group of characters immediately follow the initial group of characters. 4. The method of claim 1 , wherein the corpus of sample textual data comprises a plurality of files including text data that are of a same type, and wherein the uncategorized textual data is of the same type. 5. The method of claim 4 , wherein the corpus of sample textual data includes programming source code in a particular language, and wherein the uncategorized textual data also includes programming source code in the particular language. 6. The method of claim 1 , wherein the operations further comprise: identifying, within the uncategorized textual data, one or more locations within the uncategorized textual data at which one or more password anomalies are present; and providing location information indicating the one or more locations. 7. The method of claim 1 , further comprising: automatically determining, by the computer system, whether the uncategorized textual data contains textual data of a same type as textual data in the corpus. 8. The method of claim 1 , wherein the predictive text classifier is a Word2Vec classifier. 9. The method of claim 1 , wherein the subsequent group of characters comprises at least four characters. 10. The method of claim 1 , wherein accessing the uncategorized textual data comprises accessing a file and extracting the uncategorized textual data from one or more first portions of the file, while not extracting data from one or more other portions of the file. 11. A non-transitory computer-readable medium having stored thereon instructions that are executable by a computer system to cause the computer system to perform operations comprising: accessing uncategorized textual data; feeding the uncategorized textual data to a predictive text classifier, the predictive text classifier built using a corpus of sample textual data; determining whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using an initial group of one or more characters of the uncategorized textual data to predict a subsequent group of one or more characters that follow the initial group; and outputting result data indicating whether the uncategorized textual data was determined to have one or more password anomalies. 12. The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: tokenizing the uncategorized textual data, based on one or more provided parameters, to produce a plurality of tokenized character sequences. 13. The non-transitory computer-readable medium of claim 12 , wherein the predictive text classifier analyzes each of the plurality of tokenized character sequences to determine whether that tokenized character sequence includes a password anomaly. 14. The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: identifying, within the uncategorized textual data, one or more locations within the uncategorized textual data at which one or more password anomalies are present; and providing location information indicating the one or more locations. 15. The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: determining whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using a further subsequent group of one or more characters of the uncategorized textual data to predict the subsequent group of one or more characters that follow the initial group. 16. The non-transitory computer-readable medium of claim 15 , wherein the further subsequent group of characters do not immediately follow the subsequent group of characters. 17. A computer system, comprising: a processor; a display; and a non-transitory computer-readable medium having stored thereon instructions that are executable by the processor to cause the computer system to perform operations comprising: accessing uncategorized textual data; feeding the uncategorized textual data to a predictive text classifier, the predictive text classifier built using a corpus of sample textual data; determining whether the uncategorized textual data has one or more password anomalies based on the predictive text classifier using an initial group of one or more characters of the uncategorized textual data to predict a subsequent group of one or more characters that follow the initial group; and outputting result data indicating whether the uncategorized textual data was determined to have one or more password anomalies. 18. The computer system of claim 17 , wherein the operations further comprise: determining a predictive probability, based on the predictive text classifier, of the subsequent group of characters following the initial group of characters; and determining that a password anomaly exists if the predictive probability is below a particular threshold probability. 19. The computer system of claim 17 , wherein the corpus of sample textual data comprises a plurality of files including text data that are of a same type, and wherein the uncategorized textual data is of the same type. 20. The computer system of claim 17 , wherein the operations further comprise: identifying, within the uncategorized textual data, one or more locations within the uncategorized textual data at which one or more password anomalies are present; and providing location information indicating the one or more locations.
Related publications grouped by family.
Answers are generated from the same data shown on this page.