Uniform resource locator security analysis using malice patterns
US-2021097168-A1 · Apr 1, 2021 · US
US2023214591A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023214591-A1 |
| Application number | US-202117565770-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 30, 2021 |
| Priority date | Dec 30, 2021 |
| Publication date | Jul 6, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to one or more processors, communicative with one or more computer-readable media, are configured to automatically generate a sensitive text detector including a regular expression or keyword. A set of text inputs, including sensitive text, are received. The sensitive text is extracted from the set of text inputs. Based on the extracted sensitive text, one or both of the regular expression and the keyword are generated. The generated regular expression and/or keyword are used to generate a sensitive text detector for sensitive text detection.
Opening claim text (preview).
1 . A computer-implemented method of generating a sensitive text detector, comprising: receiving a set of text inputs comprising sensitive text; extracting the sensitive text from the set of text inputs; and generating, based on the extracted sensitive text, one or more of: a regular expression; and a keyword; and generating the sensitive text detector based on the generated one or more of the regular expression and the keyword. 2 . The method of claim 1 , wherein generating the regular expression comprises: converting the extracted sensitive text into one or more regular expressions; generating a population comprising the one or more regular expressions; evolving the population by: transforming at least one of the one or more regular expressions; adding the at least one transformed regular expression to the population; and determining a fitness score for each regular expression in the population; iterating the evolution of the population until a predetermined condition is met; and after iterating the evolution of the population, generating, based on each fitness score, the regular expression. 3 . The method of claim 2 , wherein determining the fitness score comprises: for each of multiple training samples in a training set of training samples, each training sample either comprising sensitive text or not comprising sensitive text: identifying text within the training sample based on the regular expression; and determining the fitness score based on the identified text. 4 . The method of claim 3 , wherein determining the fitness score based on the extracted text comprises: determining one or more of: for each training sample comprising sensitive text, a degree of similarity between the identified text and the sensitive text; and for each training sample not comprising sensitive text, an amount of the identified text. 5 . The method of claim 3 , wherein the fitness score satisfies the following formula: ƒ(r) = ƒ s (r) + ƒ char (r) + L score (r), wherein f s (r) is based on a degree of similarity between all of the identified text and the sensitive text, f char (r) is based on a degree of similarity between a portion of the identified text and the sensitive text, and L score is based on a length of the identified text relative to a length of the sensitive text. 6 . The method of claim 2 , wherein generating the regular expression comprises: determining that, among the fitness scores, at least one of the fitness scores is in a steady state; and in response to determining that the at least one of the fitness scores is in the steady state, generating the regular expression associated with the at least one of the fitness scores in the steady state. 7 . The method of claim 2 , wherein transforming at least one of the one or more regular expressions comprises one or more of: randomly modifying a portion of the at least one of the one or more regular expressions; and exchanging a portion of a first one of the one or more regular expressions with a portion of a second one of the one or more regular expressions. 8 . The method of claim 1 , further comprising: inputting text to the sensitive text detector, wherein the inputted text comprises sensitive text associated with one or more regular expressions; and using the sensitive text detector to extract the sensitive text from the inputted text, based on the one or more regular expressions corresponding to the generated regular expression. 9 . The method of claim 1 , wherein: extracting the sensitive text from the set of text inputs comprises: extracting a range of text based on a location of the sensitive text within the set of text inputs; and the generating comprises: identifying one or more candidate keywords within the range of text; and generating, based on the one or more candidate keywords, the keyword. 10 . The method of claim 9 , wherein extracting the range of text comprises: extracting a combination of the sensitive text and text that is one or more of: a preset distance before the sensitive text; and a preset distance after the sensitive text. 11 . The method of claim 9 , wherein identifying the one or more candidate keywords comprises: filtering the range of text by removing one or more words from the range of text; and identifying the one or more candidate keywords within the filtered range of text. 12 . The method of claim 11 , wherein filtering the range of text comprises: comparing each word in the range of text to each of multiple words in a list of stop words; and based on the comparison, removing from the range of text any word contained in the list of stop words. 13 . The method of claim 9 , wherein generating, based on the one or more candidate keywords, the keyword comprises: calculating one or more of: a co-occurrence in the range of text of each candidate keyword with at least one other candidate keyword; and a number of instances of each candidate keyword in the range of text. 14 . The method of claim 1 , further comprising: inputting text to the sensitive text detector, wherein the inputted text comprises sensitive text; and using the sensitive text detector to extract the sensitive text from the inputted text, based on one or more words in the sensitive text corresponding to the generated keyword. 15 . The method of claim 1 , further comprising: applying a check function the generated regular expression or the generated keyword. 16 . A non-transitory computer-readable medium having stored thereon computer program code configured, when executed by one or more processors, to cause the one or more processors to perform a method comprising: receiving a set of text inputs comprising sensitive text; extracting the sensitive text from the set of text inputs; and generating, based on the extracted sensitive text, one or more of: a regular expression; and a keyword; and generating a sensitive text detector based on the generated one or more of the regular expression and the keyword. 17 . A computing device for generating a sensitive text detector, comprising: one or more processors configured to: receive a set of text inputs comprising sensitive text; extract the sensitive text from the set of text inputs; and generate, based on the extracted sensitive text, one or more of: a regular expression; and a keyword; and generate the sensitive text detector based on the generated one or more of the regular expression and the keyword. 18 . The computing device of claim 17 , wherein: the one or more processors are further configured to: receive text input, wherein the text input comprises sensitive text associated with one or more regular expressions; and use the sensitive text detector to extract the sensitive text from the text input, based on the one or more regular expressions corresponding to the generated regular expression. 19 . The computing device of claim 17 , wherein: the one or more processors are further configured to: receive text input, wherein the inputted text comprises sensitive text; and use the sensitive text detector to extract the sensitive text from the text input, based on one or more words in the sensitive text corresponding to the generated keyword. 20 . The computing device of claim 17 , wherein: the one or more processors are further configured to: convert the extracted sensitive text into one or more regular expressions; generate a population comprising the one or more regular expressi
Recognition of textual entities · CPC title
Physics · mapped topic
Machine learning · CPC title
Physics · mapped topic
Matching criteria, e.g. proximity measures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.