Determining subsets of accounts using a model of transactions
US-2020394658-A1 · Dec 17, 2020 · US
US11625438B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11625438-B2 |
| Application number | US-202016826562-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 23, 2020 |
| Priority date | Mar 23, 2020 |
| Publication date | Apr 11, 2023 |
| Grant date | Apr 11, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus includes a processing device configured to obtain first and second sets of data records, each data record comprising a string associated with an attribute. The processing device is also configured to generate a similarity matrix, wherein entries of the similarity matrix comprise values characterizing similarity between respective pairs of the strings comprising a first string from a data record in the first set and a second string from a data record in the second set. The processing device is further configured to construct a graph network based on the similarity matrix comprising edges connecting pairs of the data records based on values of entries in the similarity matrix, perform a clustering operation on the graph network to identify clusters, and to initiate remedial action responsive to identifying a given cluster comprising at least one data record from each of the first and second sets of data records.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured to perform steps of: obtaining two or more sets of data records, each of at least a subset of the data records in each of the two or more sets of data records comprising at least a first string associated with a first attribute and a second string associated with a second attribute; generating at least one similarity matrix, wherein entries of the at least one similarity matrix comprise values characterizing similarity between respective pairs of string values associated with at least one of the first attribute and the second attribute, each pair of strings comprising a first string value from one of the data records in a first one of the two or more sets of data records and a second string value from one of the data records in a second one of the two or more sets of data records; constructing at least one graph network based at least in part on the at least one similarity matrix, the at least one graph network comprising a first graph network for the first attribute and a second graph network for the second attribute, each of the first graph network and the second graph network comprising edges connecting pairs of the data records in the two or more sets of data records based at least in part on values of entries in the at least one similarity matrix, at least one of the edges connecting a first data record in the first set of data records with a second data record in the second set of data records; performing at least one clustering operation on the at least one graph network to identify a first set of one or more clusters of the data records in the first graph network for the first attribute and a second set of one or more clusters of the data records in the second graph network for the second attribute; and initiating at least one remedial action responsive to identifying at least one data record that is in a first cluster with a first subset of the data records in the two or more sets of data records for the first attribute and is in a second cluster with a second subset of the data records in the two or more sets of data records for the second attribute, the second subset of the data records being different than the first subset of the data records. 2. The apparatus of claim 1 wherein the first set of data records is independent of the second set of data records. 3. The apparatus of claim 1 wherein the first set of data records is obtained from a first data source in an information processing system and the second set of data records is obtained from a second data source in the information processing system. 4. The apparatus of claim 1 wherein generating the at least one similarity matrix comprises performing string similarity calculations for the pairs of the strings. 5. The apparatus of claim 4 wherein the string similarity calculations comprise one or more edit distance calculations. 6. The apparatus of claim 5 wherein the one or more edit distance calculations comprises at least one of a Levenshtein edit distance calculation and a Jaro-Winkler edit distance calculation. 7. The apparatus of claim 1 wherein the at least one processing device is further configured to perform the step of applying a thresholding filter to values in the entries of the at least one similarity matrix to create at least one biadjacency matrix, and wherein constructing the at least one graph network is based at least in part on the at least one biadjacency matrix. 8. The apparatus of claim 7 wherein applying the thresholding filter comprises setting entries of the at least one similarity matrix with values below a designated threshold to a first value and setting entries of the at least one similarity matrix with values at or above the designated threshold to a second value. 9. The apparatus of claim 8 wherein constructing the at least one graph network comprises connecting pairs of the data records in the two or more sets of data records having entries in the at least one biadjacency matrix with the second value, and refraining from connecting pairs of the data records in the two or more sets of data records having entries in the at least one biadjacency matrix with the first value. 10. The apparatus of claim 1 wherein performing the at least one clustering operation comprises performing community detection on the at least one graph network using a community detection algorithm, the community detection algorithm comprising a Louvain community detection algorithm. 11. The apparatus of claim 1 wherein the two or more sets of data records are associated with a plurality of assets of an information technology infrastructure, the plurality of assets comprising at least one of physical and virtual computing resources in the information technology infrastructure, and wherein initiating the at least one remedial action comprises at least one of: applying one or more security hardening procedures to one or more of the plurality of assets associated with the data records in the given cluster; and modifying a configuration of one or more of the plurality of assets associated with the data records in the given cluster. 12. The apparatus of claim 1 wherein the two or more sets of data records are associated with a plurality of users of an information technology infrastructure, and wherein initiating the at least one remedial action in the enterprise system comprises at least one of: blocking access, by one or more of the plurality of users associated with the data records in the given cluster, to one or more of a plurality of assets of the information technology infrastructure, the plurality of assets comprising at least one of physical and virtual computing resources; and monitoring subsequent access, by one or more of the plurality of users associated with the data records in the given cluster, to one or more of the plurality of assets. 13. The apparatus of claim 1 wherein: generating the at least one similarity matrix comprises generating a first similarity matrix for the first strings associated with the first attribute, generating a second similarity matrix for the second strings associated with the second attribute, applying a first thresholding filter to values in entries of the first similarity matrix to generate a first biadjacency matrix, and applying a second thresholding filter to values in entries of the second similarity matrix to generate a second biadjacency matrix; and constructing the at least one graph network comprises constructing a first graph network based at least in part on the first biadjacency matrix and constructing a second graph network based at least in part on the second biadjacency matrix. 14. The apparatus of claim 1 wherein the first attribute comprises a mailing address and the second attribute comprises a name. 15. The apparatus of claim 1 wherein performing the at least one clustering operation comprises determining a degree of connectivity of a given one of the clusters in the first set of clusters and the second set of clusters, the degree of connectivity of the given cluster being based at least in part on similarity of string values for the at least one data record from the first set of data records and the at least one data record from the second set of data records that are part of the given cluster. 16. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed
to a system of files or objects, e.g. local or distributed file system or database · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
based on location or geographical consideration · CPC title
Personal security, identity or safety · CPC title
by using string matching techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.