System, method, and computer program product for early detection of a merchant data breach through machine-learning analysis
US-2021279731-A1 · Sep 9, 2021 · US
US11838301B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11838301-B2 |
| Application number | US-202117243201-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2021 |
| Priority date | Apr 28, 2021 |
| Publication date | Dec 5, 2023 |
| Grant date | Dec 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosure herein describes a system and method for predictive identification of breached entities. Identification number and expiration date pairs associated with compromised records in a source file are analyzed to identify a set of candidate entities having records at least partially matching the source file data pairs having events occurring during a selected time period. Probability vectors are calculated for records associated with each identified entity. A divergence value is calculated which represents a distance between probability distribution vectors for each entity and probability distribution vectors for the source file. A predicted breached entity is identified based on the divergence values. The predicted breached entity is notified of the predicted breach. The notification can include an identification of the breached entity, identification of breached records, predicted time of breach, and/or a recommendation to take action to mitigate the predicted breach.
Opening claim text (preview).
What is claimed is: 1. A system for predictive detection of breached entities associated with compromised records from a breach, the system comprising: at least one processor; and at least one memory comprising computer program code that, when executed by the at least one processor, is operable to: identify a plurality of matching records each having an identification number and expiration date pair matching an identification number and an expiration date pair of one of a plurality of compromised records associated with a source file; select a set of candidate entities each having at least one event associated with at least one of the plurality of matching records during a time period; generate a first set of probability distribution vectors for the plurality of compromised records associated with the source file; generate a second set of probability distribution vectors for a first plurality of records stored by a first entity selected from the set of candidate entities during a first time period; generate a third set of probability distribution vectors for a second plurality of records stored by a second entity selected from the set of candidate entities during the first time period; calculate a first divergence value representing a distance between the first set of probability distribution vectors and the second set of probability distribution vectors; calculate a second divergence value representing a distance between the first set of probability distribution vectors and the third set of probability distribution vectors; select the first entity as a predicted breached entity on condition the first divergence value is less than the second divergence value; and select the second entity as the predicted breached entity on condition the second divergence value is less than the first divergence value. 2. The system of claim 1 , further comprising: for each entity in the set of candidate entities, calculate a divergence value representing a distance between a set of probability distribution vectors for a plurality of records stored by the entity during the time period and a set of probability distribution vectors for the plurality of compromised records; compare the calculated divergence values to a threshold value; identify a set of predicted breached entities from the set of candidate entities based on the comparison; and notify, via a communications interface device, at least one entity within the set of predicted breached entities of the breach and the time period, the notification comprising a recommendation to label the plurality of records stored by the at least one entity as compromised. 3. The system of claim 1 , further comprising: generate a fourth set of probability distribution vectors for the first plurality of records stored by the first entity selected from the set of candidate entities during a second time period; and calculate a third divergence value representing a distance between the first set of probability distribution vectors and the fourth set of probability distribution vectors for the second time period. 4. The system of claim 1 , further comprising: calculate a first Bhattacharyya divergence value representing a distance between a first set of probability distribution vectors and a second set of probability distribution vectors for a selected time period; and calculate a second Bhattacharyya divergence value representing a distance between the first set of probability distribution vectors and a third set of probability distribution vectors for the selected time period. 5. The system of claim 1 , further comprising: calculate a first Kullback-Leibler (KL) divergence value representing a distance between the first set of probability distribution vectors and the second set of probability distribution vectors for a selected time period; and calculate a second KL divergence value representing a distance between the first set of probability distribution vectors and the third set of probability distribution vectors for the selected time period. 6. The system of claim 1 , further comprising: a machine learning algorithm that dynamically identifies the set of candidate entities from a plurality of possible entities and a set of possible time periods for potential occurrence of a breach. 7. The system of claim 1 , further comprising: output, via a communications interface device, to at least one remote computing device, a notification of predicted breach, the notification of predicted breach comprising an identification of at least one predicted breached entity, a predicted time period of occurrence of the breach and a set of identification numbers associated with at least one compromised record associated with at least one event during the predicted time period. 8. A computerized method for predictive detection of breached entities associated with compromised records from a breach, the method comprising: identifying a plurality of matching records each having an identification number and expiration date pair matching an identification number and an expiration date pair of one of a plurality of compromised records associated with a source file; selecting a set of candidate entities each having at least one event associated with at least one of the plurality of matching records during a time period; generating a first set of probability distribution vectors for the plurality of compromised records associated with the source file; generating a second set of probability distribution vectors for a first plurality of records stored by a first entity selected from the set of candidate entities during a first time period; generating a third set of probability distribution vectors for a second plurality of records stored by a second entity selected from the set of candidate entities during the first time period; calculating a first divergence value representing a distance between the first set of probability distribution vectors and the second set of probability distribution vectors; calculating a second divergence value representing a distance between the first set of probability distribution vectors and the third set of probability distribution vectors; selecting the first entity as a predicted breached entity on condition the first divergence value is less than the second divergence value; and selecting the second entity as the predicted breached entity on condition the second divergence value is less than the first divergence value. 9. The computerized method of claim 8 , further comprising: calculating a divergence value representing a distance between a set of probability distribution vectors for a plurality of records stored by each entity in the set of candidate entities during the time period and a set of probability distribution vectors for the plurality of compromised records; comparing the calculated divergence values to a threshold value; identifying a set of predicted breached entities from the set of candidate entities based on the comparison; and notifying, via a communications interface device, at least one entity within the set of predicted breached entities of the breach and the time period, the notification comprising a recommendation to label the plurality of records stored by the at least one entity as compromised. 10. The computerized method of claim 8 , further comprising: generating a fourth set of probability distribution vectors for the first plurality of records stored by the first entity selected from the set of candidate entities during a second time period; and calculating a third divergence value representing a distance between the first set of probability distribution vectors and the fourth set of probability distribution v
Event detection, e.g. attack signature detection · CPC title
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Approximate or statistical queries · CPC title
Entity profiles · CPC title
Grouping of entities · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.