Generating synthetic data using reject inference processes for modifying lead scoring models

US2020027157A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020027157-A1
Application numberUS-201816037700-A
CountryUS
Kind codeA1
Filing dateJul 17, 2018
Priority dateJul 17, 2018
Publication dateJan 23, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and non-transitory computer readable storage media are disclosed for using reject inference to generate synthetic data for modifying lead scoring models. For example, the disclosed system identifies an original dataset corresponding to an output of a lead scoring model that generates scores for a plurality of prospects to indicate a likelihood of success of prospects of the plurality of prospects. In one or more embodiments, the disclosed system selects a reject inference model by performing simulations on historical prospect data associated with the original dataset. Additionally, the disclosed system uses the selected reject inference model to generate an imputed dataset by generating synthetic outcome data representing simulated outcomes of rejected prospects in the original dataset. The disclosed system then uses the imputed dataset to modify the lead scoring model by modifying at least one parameter of the lead scoring model using the synthetic outcome data.

First claim

Opening claim text (preview).

What is claimed is: 1 . In a digital medium environment for classifying lead prospects, a computer-implemented method for using reject inference to generate synthetic data for modify lead scoring models comprising: identifying, by at least one processor, an original dataset corresponding to an output of a lead scoring model that generates scores for a plurality of prospects, the scores indicating a likelihood of success of prospects of the plurality of prospects; a step for generating an imputed dataset by selecting a reject inference model from a plurality of reject inference models and generating outcome data by performing a plurality of simulations; and updating the lead scoring model using the imputed dataset by modifying at least one parameter of the lead scoring model based on synthetic outcome data of the imputed dataset. 2 . The computer-implemented method as recited in claim 1 , wherein the step for generating the imputed dataset by selecting a reject inference model from a plurality of reject inference models and generating outcome data by performing a plurality of simulations comprises selecting a simple augmentation model for augmenting the original dataset. 3 . The computer-implemented method as recited in claim 1 , wherein the step for generating the imputed dataset by selecting a reject inference model from a plurality of reject inference models and generating outcome data by performing a plurality of simulations comprises selecting a fuzzy augmentation model for augmenting the original dataset. 4 . The computer-implemented method as recited in claim 1 , further comprising identifying a plurality of characteristics of the original dataset, the plurality of characteristics comprising a split effectiveness of the lead scoring model for the original dataset, a success rate of the original dataset, and a size of a set of known labels in the original dataset. 5 . A non-transitory computer readable storage medium comprising instructions that, when executed by at least one processor, cause a computer system to: identify an original dataset corresponding to an output of a lead scoring model that generates scores for a plurality of prospects, the scores indicating a likelihood of success of prospects of the plurality of prospects; generate, based on the original dataset, an imputed dataset using a reject inference model on a subset of the plurality of prospects to generate synthetic outcome data for the subset; and update the lead scoring model using the imputed dataset by modifying at least one parameter of the lead scoring model based on the synthetic outcome data. 6 . The non-transitory computer readable storage medium as recited in claim 5 , further comprising instructions that, when executed by the at least one processor, cause the computer system to select the reject inference model from a plurality of reject inference models by performing a plurality of simulations using the plurality of reject inference models on historical data associated with the original dataset. 7 . The non-transitory computer readable storage medium as recited in claim 6 , wherein the plurality of reject inference models comprises a simple augmentation model and a fuzzy augmentation model. 8 . The non-transitory computer readable storage medium as recited in claim 5 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: identify a characteristic of the original dataset based on the plurality of prospects in the original dataset; determine that the characteristic of the original dataset does not meet a characteristic threshold indicating whether to use the original dataset or to generate the synthetic outcome data; and generate the synthetic outcome data in response to determining that the characteristic of the original dataset does not meet the characteristic threshold. 9 . The non-transitory computer readable storage medium as recited in claim 8 , wherein the characteristic comprises a split effectiveness of the lead scoring model for the original dataset, a success rate of the original dataset, or a size of a set of known labels in the original dataset. 10 . The non-transitory computer readable storage medium as recited in claim 8 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: compare a plurality of characteristics of the original dataset to a plurality of characteristic thresholds; and generate the synthetic outcome data in response to determining that the plurality of characteristics of the original dataset do not meet the plurality of characteristic thresholds. 11 . The non-transitory computer readable storage medium as recited in claim 10 , further comprising instructions that, when executed by the at least one processor, cause the computer system to determine the plurality of characteristic thresholds based on historical data associated with the original dataset. 12 . The non-transitory computer readable storage medium as recited in claim 5 , further comprising instructions that, when executed by the at least one processor, cause the computer system to determine a plurality of features of the plurality of prospects for generating the synthetic outcome data, wherein determining the plurality of features comprises: performing a plurality of simulations on historical data associated with the original dataset using variable combinations of the plurality of features; and selecting a set of features based on a performance of the variable combinations of the plurality of features in the plurality of simulations. 13 . The non-transitory computer readable storage medium as recited in claim 5 , further comprising instructions that, when executed by the at least one processor, cause the computer system to score a plurality of new prospects using the updated lead scoring model based on the synthetic outcome data. 14 . In a digital medium environment for classifying lead prospects, a system for using reject inference to generate synthetic data for modify lead scoring models comprising: at least one processor; and a non-transitory computer memory comprising: an original dataset comprising data for a plurality of prospects; and instructions that, when executed by the at least one processor, cause the system to: identify an output of a lead scoring model that generates scores for a plurality of prospects, the scores indicating a likelihood of success of each prospect of the plurality of prospects; select a reject inference model from a plurality of reject inference models based on a plurality of simulations performed on historical prospect data associated with the original dataset using the plurality of reject inference models; generate an imputed dataset using the selected reject inference model on a subset of the plurality of prospects corresponding to rejected prospects to generate synthetic outcome data representing simulated outcomes of the subset of the plurality of prospects; and modify the lead scoring model based on the synthetic outcome data of the imputed dataset by modifying at least one parameter of the lead scoring model. 15 . The system as recited in claim 14 , further comprising instructions that, when executed by the at least one processor, cause the system to: identify a plurality of characteristics of the original dataset based on the plurality of prospects in the original dataset; determine that the plurality of characteristics of the original dataset does not meet a plurality of characteristic thresholds indicating whether to use the original d

Assignees

Inventors

Classifications

  • Ensemble learning · CPC title

  • Fuzzy inferencing · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020027157A1 cover?
Methods, systems, and non-transitory computer readable storage media are disclosed for using reject inference to generate synthetic data for modifying lead scoring models. For example, the disclosed system identifies an original dataset corresponding to an output of a lead scoring model that generates scores for a plurality of prospects to indicate a likelihood of success of prospects of the pl…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0201. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 23 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).