Incremental time window procedure for selecting training samples for a supervised learning algorithm

US11216751B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11216751-B2
Application numberUS-201916657450-A
CountryUS
Kind codeB2
Filing dateOct 18, 2019
Priority dateOct 18, 2019
Publication dateJan 4, 2022
Grant dateJan 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, method, and computer program product embodiments for generating labels for training a machine learning mode using an incremental time window process. The described process may be used in a recurrence detection system. A dataset may be analyzed using incremental split dates to divide the dataset into an analysis portion and a holdout portion. The analysis portion may be analyzed to determine input features related to a predicted recurrence in the dataset. The holdout portion may be tested against the analysis portion and the input features to generate a label. The label may indicate whether or not the holdout portion confirms the prediction. The testing of the holdout portion against the analysis portion may be repeated by incrementally using different split dates and multiple separate analysis portions and holdout portions to generate multiple labels and corresponding input features.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: aggregating a dataset of data points; performing a cadence analysis on the dataset to determine a recurrence period of the data points; generating a first label using a first subset of the dataset having data points within a first multiple of the recurrence period; generating a second label using a second subset of the dataset having a number of data points within a second multiple of the recurrence period, wherein the second subset includes the number of data points from the first subset within the first multiple of the recurrence period, and wherein the number of data points is based on a matching criteria that comprises a number of predictions and a date tolerance, and wherein the number of data points within the second multiple of the recurrence period are within the date tolerance and are equal to the number of predictions; training a machine learning model using the first label and the second label; receiving a set of transactions associated with an account and a merchant; and generating, by the trained machine learning model, a predicted recurrence within the set of transactions, wherein generating the predicted recurrence further comprises: determining a vector strength, a coverage, and a redundancy, wherein the vector strength, the coverage, and the redundancy are phase variables determined based on a mapping of transaction dates in a phase space associated with the set of transactions and characterizing a recurrence period within the set of transactions, and wherein the vector strength includes a first value reflecting a level of recurrence of the set of transactions within the recurrence period, the coverage includes a second value reflecting a number of recurrence periods that include no transactions from the set of transactions, and the redundancy includes a third value reflecting a number of recurrence periods that include at least one transaction from the set of transactions, wherein the set of transactions are plotted on the phase space for multiple recurrence periods with associated phase variables for each of the multiple recurrence periods and wherein the multiple recurrence periods includes the recurrence period; and generating a probability of the merchant having the predicted recurrence within the set of transactions for the account, for the recurrence period, using the trained machine learning model based on an evaluation of at least one of the first value, the second value and the third value. 2. The computer-implemented method of claim 1 , wherein generating the first label further comprises: designating the first subset as an analysis portion; designating the data points having a time value exceeding the first multiple of the recurrence period as a holdout portion; and testing the holdout portion against the analysis portion to generate the first label. 3. The computer-implemented method of claim 2 , further comprising: identifying a holdout multiple of the recurrence period; and determining a delta between the holdout multiple and a data point of the dataset having a most recent time value to generate a holdout date. 4. The computer-implemented method of claim 3 , further comprising: determining that a third multiple of the recurrence period exceeds the holdout date; and in response to the determining, ceasing label generation. 5. The computer-implemented method of claim 1 , wherein the second multiple of the recurrence period is incrementally selected relative to the first multiple of the recurrence period. 6. The computer-implemented method of claim 1 , further comprising: ceasing label generation in response to determining that no data points of the dataset have a time value between the second multiple of the recurrence period and a subsequent multiple of the recurrence period. 7. The computer-implemented method of claim 1 , further comprising: generating a third label using a third subset of the data having data points within a third multiple of the recurrence period, wherein the third subset includes the data points from the first subset and the data points from the second subset. 8. A system, comprising: a memory; and at least one processor coupled to the memory and configured to: aggregate a dataset of data points; analyze the dataset to determine a recurrence period of the data points; group the data points of the dataset into a first analysis subset and a first holdout subset, wherein the first analysis subset includes data points having a time value within the recurrence period; generate a first label by testing the first holdout subset against the first analysis subset; group the data points of the dataset into a second analysis subset and a second holdout subset, wherein the second analysis subset includes first data points from the first analysis subset and second data points having a time value within a multiple of the recurrence period, wherein the data points is based on a matching criteria that comprises a number of predictions and a date tolerance, and wherein the data points within the second analysis subset are within the date tolerance and are equal to the number of predictions; generate a second label by testing the second holdout subset against the second analysis subset; train a machine learning model using the first label and the second label; receive a set of transactions associated with an account and a merchant; and generate, by the trained machine learning model, a predicted recurrence within the set of transactions, wherein to generate the predicted recurrence further comprises: determining a vector strength, a coverage, and a redundancy, wherein the vector strength, the coverage, and the redundancy are phase variables determined based on a mapping of transaction dates in a phase space associated with the set of transactions and characterizing a recurrence period within the set of transactions, and wherein the vector strength includes a first value reflecting a level of recurrence of the set of transactions within the recurrence period, the coverage includes a second value reflecting a number of recurrence periods that include no transactions from the set of transactions, and the redundancy includes a third value reflecting a number of recurrence periods that include at least one transaction from the set of transactions, wherein the set of transactions are plotted on the phase space for multiple recurrence periods with associated phase variables for each of the multiple recurrence periods and wherein the multiple recurrence periods includes the recurrence period; and generating a probability of the merchant having the predicted recurrence within the set of transactions for the account, for the recurrence period, using the trained machine learning model based on an evaluation of at least one of the first value, the second value and the third value. 9. The system of claim 8 , wherein the difference in data points between the second analysis subset and the first analysis subset is the difference in data points between the first holdout subset and the second holdout subset. 10. The system of claim 8 , wherein the at least one processor is further configured to: cease label generation in response to determining that no data points of the dataset have a time value between the multiple of the recurrence period and a subsequent multiple of the recurrence period. 11. The system of claim 8 , wherein the multiple of the recurrence period is incrementally selected relative to the recurrence period. 12. The system of claim 9 , wherein to generate the first label, the at least one processor is further configured to: identify a se

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11216751B2 cover?
Disclosed herein are system, method, and computer program product embodiments for generating labels for training a machine learning mode using an incremental time window process. The described process may be used in a recurrence detection system. A dataset may be analyzed using incremental split dates to divide the dataset into an analysis portion and a holdout portion. The analysis portion may…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).