Information processing device
US-12118585-B2 · Oct 15, 2024 · US
US10311467B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10311467-B2 |
| Application number | US-201514667338-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 24, 2015 |
| Priority date | Mar 24, 2015 |
| Publication date | Jun 4, 2019 |
| Grant date | Jun 4, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for selecting optimal policies that maximize expected return subject to given risk tolerance and confidence levels. In particular, methods and systems for selecting an optimal ad recommendation policy—based on user data, a set of ad recommendation policies, and risk thresholds—by sampling the user data and estimating gradients. The system or methods utilize the estimated gradients to select a good ad recommendation policy (an ad recommendation policy with high expected return) subject to the risk tolerance and confidence levels. To assist in selecting a risk-sensitive ad recommendation policy, a gradient-based algorithm is disclosed to find a near-optimal policy for conditional-value-at-risk (CVaR) risk-sensitive optimization.
Opening claim text (preview).
We claim: 1. In a digital medium environment for identifying and deploying digital advertising campaigns across a plurality of client devices, where campaigns can be altered, removed, or replaced on demand, a computer-implemented method of using risk-sensitive, lifetime value optimization based on a conditional value at risk measure to improve accuracy, efficiency, and stability in selecting and executing ad recommendation policies comprising: identifying, at a server, a risk-tolerance value indicating a measure of permissible variance for ad recommendation policies in a digital content campaign, the risk-tolerance value corresponding to a conditional value at risk; identifying a set of user data indicating prior interactions by user client devices in relation to digital advertisements of one or more digital content campaigns; determining, by the server, an optimized ad recommendation policy subject to the risk-tolerance value by converging a policy parameter and a risk parameter, the risk parameter indicating policy conditional value at risk, to identify the optimized ad recommendation policy such that the optimized ad recommendation policy satisfies the conditional value at risk by: determining a gradient for the policy parameter by sampling the set of user data; determining a gradient for the risk parameter by sampling the set of user data; using the determined gradient for the risk parameter to select an updated risk parameter that indicates updated policy conditional value at risk; and using the determined gradient for the policy parameter to identify an updated policy parameter; and in response to determining the optimized ad recommendation policy, executing the digital content campaign subject to the risk-tolerance value corresponding to the conditional value at risk by providing digital advertisements to client devices in accordance with the optimized ad recommendation policy. 2. The method as recited in claim 1 , wherein determining, by the server, the optimized ad recommendation policy subject to the risk-tolerance value comprises converging the policy parameter, the risk parameter, and a constraint parameter to identify the optimized ad recommendation policy that is subject to the risk-tolerance value by: determining a gradient for the constraint parameter by sampling the set of user data; and using the determined gradient for the constraint parameter to select an updated constraint parameter. 3. The method as recited in claim 2 , wherein converging the policy parameter, the risk parameter, and the constraint parameter comprises identifying the updated risk parameter according to a first time scale, identifying the updated policy parameter according to a second time scale, and identifying the updated constraint parameter according to a third time scale. 4. The method as recited in claim 3 , wherein: the first time scale is faster than the second time scale; and the second time scale is faster than the third time scale. 5. The method as recited in claim 1 , wherein determining the gradient for the policy parameter by sampling the set of user data comprises generating one or more trajectories by following the policy parameter during sampling and using the one or more trajectories to determine the gradient for the policy parameter. 6. The method as recited in claim 1 , wherein: the conditional value at risk comprises a threshold measure of a mean of an alpha-tail distribution; and determining the optimized ad recommendation policy subject to the risk-tolerance value comprises converging the policy parameter and the risk parameter such that an alpha-tail distribution corresponding to the optimized ad recommendation policy satisfies the threshold measure of the mean of the alpha-tail distribution. 7. The method as recited in claim 6 , further comprising: receiving a confidence level for the risk-tolerance value; determining the threshold measure of the mean of the alpha-tail distribution based on the risk-tolerance value and the confidence level; and determining that the optimized ad recommendation policy satisfies the risk-tolerance value and the confidence level by determining that the optimized ad recommendation policy satisfies the threshold measure of the mean of the alpha-tail distribution. 8. The method as recited in claim 1 , further comprising: selecting the policy parameter from a set of ad recommendation policies; and selecting the updated risk parameter from the set of ad recommendation policies. 9. The method as recited in claim 3 , further comprising applying projection operators when determining gradients for the risk parameter, the policy parameter, and the constraint parameter to ensure convergence to the optimal ad recommendation policy. 10. The method as recited in claim 1 , wherein the risk-tolerance value comprises a threshold click-thru rate. 11. The method as recited in claim 1 , wherein the optimized ad recommendation policy returns an advertisement to present on a webpage to a user with a given set of characteristics. 12. The method as recited in claim 1 , wherein the optimized ad recommendation policy projects a click-thru rate less than an ad recommendation policy optimized for life-time value without regard to risk. 13. In a digital medium environment for identifying and deploying policies, where policies can be altered, removed, or replaced on demand, a method of using risk-sensitive, lifetime value optimization based on a conditional value at risk measure to improve accuracy, efficiency, and stability in selecting and executing a policy comprising: identifying, by one or more processors, a risk-tolerance value indicating a measure of permissible variance for policies and a confidence level, the risk-tolerance value and the confidence level corresponding to a conditional value at risk indicating a threshold measure of a mean of an alpha-tail distribution; identifying a set of sample data comprising prior interactions by user client devices; receiving a set of policies, each policy being defined by one or more policy parameters; and determining, by the one or more processors, an optimized policy that is subject to the risk-tolerance value within the confidence level using a trajectory-based policy gradient algorithm by converging a policy parameter, a risk parameter indicating policy conditional value at risk, and a constraint parameter by sampling the sample data to identify the optimized policy that satisfies the threshold measure of the mean of the alpha-tail distribution; and in response to determining the optimized policy, executing the optimized policy subject to the risk-tolerance value and the confidence level corresponding to the conditional value at risk such that digital content is provided to client devices in accordance with the optimized policy. 14. The method as recited in claim 13 , wherein determining, by the one or more processors, the optimized policy that is subject to the risk-tolerance value within the confidence level using the trajectory-based policy gradient algorithm comprises, for each of the policy parameter, the risk parameter, and the constraint parameter repeatedly: generating one or more trajectories of a given parameter of the policy parameter, the risk parameter, or the constraint parameter by sampling the set of sample data; estimating, by the one or more processors, a gradient for the given parameter based on the generated one or more trajectories of the given parameter; and using the estimated gradient for the given parameter to update the given parameter. 15. The method as recited in claim 14 , further comprising a
based on user profile or attribute · CPC title
Optimization · CPC title
Traffic · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.