Identifying homogenous clusters
US-2020004870-A1 · Jan 2, 2020 · US
US11520831B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11520831-B2 |
| Application number | US-202016896895-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 9, 2020 |
| Priority date | Jun 9, 2020 |
| Publication date | Dec 6, 2022 |
| Grant date | Dec 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A regular expression that is able to be used to identify an item as belonging to a specific group among a plurality of different groups is determined. The regular expression is tested against a sampling of items known to belong to the specific group to determine a true positive metric. The regular expression is tested against a sampling of items known to belong to other groups among the plurality of different groups outside the specific group to determine a false positive metric. An accuracy metric of the determined regular expression is calculated based at least in part on the true positive metric and the false positive metric. The accuracy metric is provided for use in evaluating the regular expression.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: determining a regular expression that is used to identify an item as belonging to a specific group among a plurality of different groups; testing the regular expression against a sampling of items known to belong to the specific group to determine a true positive metric; testing the regular expression against a sampling of items known to belong to other groups among the plurality of different groups outside the specific group to determine a false positive metric; calculating an accuracy metric of the determined regular expression based at least in part on the true positive metric and the false positive metric, wherein calculating the accuracy metric of the determined regular expression includes calculating a quotient comprising a numerator portion that is based at least in part on the true positive metric and a denominator portion that is based at least in part on the false positive metric, wherein the denominator portion equals a sum of the false positive metric and a count of items in the specific group; and providing the accuracy metric for use in evaluating the regular expression. 2. The method of claim 1 , wherein determining the regular expression includes automatically generating the regular expression based on text data associated with the specific group. 3. The method of claim 1 , wherein the specific group comprises a plurality of software processes. 4. The method of claim 3 , wherein the regular expression is used to determine a database field corresponding to the specific group with which to populate a configuration management database. 5. The method of claim 1 , wherein testing the regular expression against the sampling of items known to belong to the other groups among the plurality of different groups outside the specific group includes applying the regular expression to text data associated with the sampling of items. 6. The method of claim 5 , wherein the text data includes commands for starting software processes. 7. The method of claim 5 , wherein the text data includes parameters that specify configuration information for software processes. 8. The method of claim 1 , wherein the true positive metric corresponds to a number of items that the regular expression positively matches in the specific group. 9. The method of claim 1 , wherein the false positive metric corresponds to a number of items that the regular expression positively matches in the other groups among the plurality of different groups outside the specific group. 10. The method of claim 1 , wherein the numerator portion equals the true positive metric. 11. The method of claim 1 , wherein providing the accuracy metric for use in evaluating the regular expression includes transmitting the accuracy metric to a user via a network. 12. The method of claim 1 , wherein providing the accuracy metric for use in evaluating the regular expression includes providing a user with an option to manually adjust the regular expression. 13. The method of claim 1 , further comprising recalculating the accuracy metric in response to a determination that the accuracy metric falls below a specified threshold. 14. The method of claim 1 , further comprising providing a suggestion to a user to manually adjust the regular expression in response to a determination that the accuracy metric falls below a specified threshold. 15. The method of claim 1 , wherein items belonging to the specific group and the other groups among the plurality of different groups outside the specific group have been grouped using data clustering. 16. The method of claim 15 , wherein the data clustering is associated with density-based spatial clustering of applications with noise. 17. A system, comprising: one or more processors configured to: determine a regular expression that is used to identify an item as belonging to a specific group among a plurality of different groups; test the regular expression against a sampling of items known to belong to the specific group to determine a true positive metric; test the regular expression against a sampling of items known to belong to other groups among the plurality of different groups outside the specific group to determine a false positive metric; calculate an accuracy metric of the determined regular expression based at least in part on the true positive metric and the false positive metric, wherein the one or more processors are configured to calculate the accuracy metric of the determined regular expression including by being configured to calculate a quotient comprising a numerator portion that is based at least in part on the true positive metric and a denominator portion that is based at least in part on the false positive metric, wherein the denominator portion equals a sum of the false positive metric and a count of items in the specific group; and provide the accuracy metric for use in evaluating the regular expression; and a memory coupled with the one or more processors and configured to provide the one or more processors with instructions. 18. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for: determining a regular expression that is used to identify an item as belonging to a specific group among a plurality of different groups; testing the regular expression against a sampling of items known to belong to the specific group to determine a true positive metric; testing the regular expression against a sampling of items known to belong to other groups among the plurality of different groups outside the specific group to determine a false positive metric; calculating an accuracy metric of the determined regular expression based at least in part on the true positive metric and the false positive metric, wherein calculating the accuracy metric of the determined regular expression includes calculating a quotient comprising a numerator portion that is based at least in part on the true positive metric and a denominator portion that is based at least in part on the false positive metric, wherein the denominator portion equals a sum of the false positive metric and a count of items in the specific group; and providing the accuracy metric for use in evaluating the regular expression. 19. The computer program product of claim 18 , wherein determining the regular expression includes automatically generating the regular expression based on text data associated with the specific group. 20. The computer program product of claim 18 , wherein the specific group comprises a plurality of software processes.
by using string matching techniques · CPC title
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
Clustering techniques · CPC title
Clustering; Classification · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.