What technology area does this patent fall under?

Primary CPC classification G06Q50/265. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 11 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Monitoring information processing systems utilizing co-clustering of strings in different sets of data records

US11625438B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11625438-B2
Application number	US-202016826562-A
Country	US
Kind code	B2
Filing date	Mar 23, 2020
Priority date	Mar 23, 2020
Publication date	Apr 11, 2023
Grant date	Apr 11, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus includes a processing device configured to obtain first and second sets of data records, each data record comprising a string associated with an attribute. The processing device is also configured to generate a similarity matrix, wherein entries of the similarity matrix comprise values characterizing similarity between respective pairs of the strings comprising a first string from a data record in the first set and a second string from a data record in the second set. The processing device is further configured to construct a graph network based on the similarity matrix comprising edges connecting pairs of the data records based on values of entries in the similarity matrix, perform a clustering operation on the graph network to identify clusters, and to initiate remedial action responsive to identifying a given cluster comprising at least one data record from each of the first and second sets of data records.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured to perform steps of: obtaining two or more sets of data records, each of at least a subset of the data records in each of the two or more sets of data records comprising at least a first string associated with a first attribute and a second string associated with a second attribute; generating at least one similarity matrix, wherein entries of the at least one similarity matrix comprise values characterizing similarity between respective pairs of string values associated with at least one of the first attribute and the second attribute, each pair of strings comprising a first string value from one of the data records in a first one of the two or more sets of data records and a second string value from one of the data records in a second one of the two or more sets of data records; constructing at least one graph network based at least in part on the at least one similarity matrix, the at least one graph network comprising a first graph network for the first attribute and a second graph network for the second attribute, each of the first graph network and the second graph network comprising edges connecting pairs of the data records in the two or more sets of data records based at least in part on values of entries in the at least one similarity matrix, at least one of the edges connecting a first data record in the first set of data records with a second data record in the second set of data records; performing at least one clustering operation on the at least one graph network to identify a first set of one or more clusters of the data records in the first graph network for the first attribute and a second set of one or more clusters of the data records in the second graph network for the second attribute; and initiating at least one remedial action responsive to identifying at least one data record that is in a first cluster with a first subset of the data records in the two or more sets of data records for the first attribute and is in a second cluster with a second subset of the data records in the two or more sets of data records for the second attribute, the second subset of the data records being different than the first subset of the data records. 2. The apparatus of claim 1 wherein the first set of data records is independent of the second set of data records. 3. The apparatus of claim 1 wherein the first set of data records is obtained from a first data source in an information processing system and the second set of data records is obtained from a second data source in the information processing system. 4. The apparatus of claim 1 wherein generating the at least one similarity matrix comprises performing string similarity calculations for the pairs of the strings. 5. The apparatus of claim 4 wherein the string similarity calculations comprise one or more edit distance calculations. 6. The apparatus of claim 5 wherein the one or more edit distance calculations comprises at least one of a Levenshtein edit distance calculation and a Jaro-Winkler edit distance calculation. 7. The apparatus of claim 1 wherein the at least one processing device is further configured to perform the step of applying a thresholding filter to values in the entries of the at least one similarity matrix to create at least one biadjacency matrix, and wherein constructing the at least one graph network is based at least in part on the at least one biadjacency matrix. 8. The apparatus of claim 7 wherein applying the thresholding filter comprises setting entries of the at least one similarity matrix with values below a designated threshold to a first value and setting entries of the at least one similarity matrix with values at or above the designated threshold to a second value. 9. The apparatus of claim 8 wherein constructing the at least one graph network comprises connecting pairs of the data records in the two or more sets of data records having entries in the at least one biadjacency matrix with the second value, and refraining from connecting pairs of the data records in the two or more sets of data records having entries in the at least one biadjacency matrix with the first value. 10. The apparatus of claim 1 wherein performing the at least one clustering operation comprises performing community detection on the at least one graph network using a community detection algorithm, the community detection algorithm comprising a Louvain community detection algorithm. 11. The apparatus of claim 1 wherein the two or more sets of data records are associated with a plurality of assets of an information technology infrastructure, the plurality of assets comprising at least one of physical and virtual computing resources in the information technology infrastructure, and wherein initiating the at least one remedial action comprises at least one of: applying one or more security hardening procedures to one or more of the plurality of assets associated with the data records in the given cluster; and modifying a configuration of one or more of the plurality of assets associated with the data records in the given cluster. 12. The apparatus of claim 1 wherein the two or more sets of data records are associated with a plurality of users of an information technology infrastructure, and wherein initiating the at least one remedial action in the enterprise system comprises at least one of: blocking access, by one or more of the plurality of users associated with the data records in the given cluster, to one or more of a plurality of assets of the information technology infrastructure, the plurality of assets comprising at least one of physical and virtual computing resources; and monitoring subsequent access, by one or more of the plurality of users associated with the data records in the given cluster, to one or more of the plurality of assets. 13. The apparatus of claim 1 wherein: generating the at least one similarity matrix comprises generating a first similarity matrix for the first strings associated with the first attribute, generating a second similarity matrix for the second strings associated with the second attribute, applying a first thresholding filter to values in entries of the first similarity matrix to generate a first biadjacency matrix, and applying a second thresholding filter to values in entries of the second similarity matrix to generate a second biadjacency matrix; and constructing the at least one graph network comprises constructing a first graph network based at least in part on the first biadjacency matrix and constructing a second graph network based at least in part on the second biadjacency matrix. 14. The apparatus of claim 1 wherein the first attribute comprises a mailing address and the second attribute comprises a name. 15. The apparatus of claim 1 wherein performing the at least one clustering operation comprises determining a degree of connectivity of a given one of the clusters in the first set of clusters and the second set of clusters, the degree of connectivity of the given cluster being based at least in part on similarity of string values for the at least one data record from the first set of data records and the at least one data record from the second set of data records that are part of the given cluster. 16. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed

Assignees

Dell Products Lp

Inventors

Fauber Benjamin

Classifications

G06F21/6218
to a system of files or objects, e.g. local or distributed file system or database · CPC title
G06F16/9024
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
G06Q30/0205
based on location or geographical consideration · CPC title
G06Q50/265Primary
Personal security, identity or safety · CPC title
G06F16/90344
by using string matching techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 77747928

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11625438B2 cover?: An apparatus includes a processing device configured to obtain first and second sets of data records, each data record comprising a string associated with an attribute. The processing device is also configured to generate a similarity matrix, wherein entries of the similarity matrix comprise values characterizing similarity between respective pairs of the strings comprising a first string from …
Who is the assignee on this patent?: Dell Products Lp
What technology area does this patent fall under?: Primary CPC classification G06Q50/265. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 11 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Determining subsets of accounts using a model of transactions

Graph analysis of time-series cluster data

Unsupervised encoder-decoder neural network security event detection

Determining a degree of similarity of a subset of tabular data arrangements to subsets of graph data arrangements at ingestion into a data-driven collaborative dataset platform

Process traces clustering: a heterogeneous information network approach

Frequently asked questions