What technology area does this patent fall under?

Primary CPC classification G06Q30/0242. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu May 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automated System for Safe Policy Improvement

US2016148246A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2016148246-A1
Application number	US-201414551898-A
Country	US
Kind code	A1
Filing date	Nov 24, 2014
Priority date	Nov 24, 2014
Publication date	May 26, 2016
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Risk quantification, policy search, and automated safe policy deployment techniques are described. In one or more implementations, techniques are utilized to determine safety of a policy, such as to express a level of confidence that a new policy will exhibit an increased measure of performance (e.g., interactions or conversions) over a currently deployed policy. In order to make this determination, reinforcement learning and concentration inequalities are utilized, which generate and bound confidence values regarding the measurement of performance of the policy and thus provide a statistical guarantee of this performance. These techniques are usable to quantify risk in deployment of a policy, select a policy for deployment based on estimated performance and a confidence level in this estimate (e.g., which may include use of a policy space to reduce an amount of data processed), used to create a new policy through iteration in which parameters of a policy are iteratively adjusted and an effect of those adjustments are evaluated, and so forth.

First claim

Opening claim text (preview).

What is claimed is: 1 . In a digital medium environment for identifying and deploying potential digital advertising campaigns, where campaigns can be altered, removed, or replaced on demand, a method for optimizing campaign selection in the digital medium environment, the method comprising: controlling replacement of one or more deployed polices of a content provider that are used to select advertisements with at least one of a plurality of policies, the controlling including: iteratively collecting deployment data that describes deployment of the one or more deployed policies; iteratively adjusting one or more parameters to generate new policies that are usable to select the advertisements; applying reinforcement learning and a concentration inequality on deployment data that describes the deployment of the one or more deployed policies using the new policies having the adjusted one or more parameters to estimate values of a measure of performance of the new policies and calculate one or more statistical guarantees of the estimated values; and causing deployment of one or more of the new policies responsive to determining that the one or more statistical guarantees express at least a confidence level that the estimated values of the measure of performance at least correspond to a threshold based at least in part on a measure of performance of the one or more deployed policies. 2 . A method as described in claim 1 , wherein: each of the plurality of policies is expressed using a high-dimensional vector; and the determining includes computing a direction in a policy space that is expected to point towards a safe region. 3 . A method as described in claim 2 , wherein the determining includes searching the policy space as constrained to line searches of the high-dimensional vectors of the plurality of policies that correspond to the direction. 4 . A method as described in claim 2 , wherein the direction is a generalized natural policy gradient. 5 . A method as described in claim 1 , wherein the threshold is based at least in part on the measured performance of the deployed policy and a set margin. 6 . A method as described in claim 1 , wherein the concentration inequality is configured to move estimated values above a defined threshold to lie on the defined threshold. 7 . A method as described in claim 1 , wherein the concentration inequality is configured to be independent of a range of random variables of the estimated values. 8 . A method as described in claim 1 , wherein the concentration inequality is configured to collapse tails of random variable distributions of the estimated values, normalize the random variable distributions, and then generate a lower-bound from which a lower-bound on a uniform mean of original random variables of the estimated values is extracted. 9 . A method as described in claim 1 , wherein each said policy is configured for use by the content provider to select advertisements for inclusion with content based at least in part based on characteristics associated with a request to access the content. 10 . A method as described in claim 9 , wherein the characteristics associated with the request include characteristics of a user or device that initiated the request or characteristics of the request itself. 11 . A method as described in claim 9 , wherein the characteristics are expressed using a feature vector. 12 . A method as described in claim 1 , wherein received deployment data does not describe deployment of the new policies. 13 . A system comprising: one or more computing devices configured to perform operations including selecting at least one of a plurality to policies to replace one or more deployed policies of a content provider that are used to select advertisements to be included with content, the selecting including: iteratively adjusting a plurality of high-dimensional vectors that express respective ones of the plurality of policies; computing a direction in a policy space of the plurality of policies that is expected to point towards a region that is expected to be safe as including the policies that have a measure of performance that is greater than a threshold measure of performance and within a defined level of confidence; and selecting at least one of the plurality of policies for deployment responsive to a determination that the at least one policy has high-dimensional vectors that correspond to the direction and that exhibits the measure of performance that is greater than a threshold measure of performance and within a defined level of confidence. 14 . A system as described in claim 13 , wherein the selecting includes searching the plurality of policies as constrained to line searches of the high-dimensional vectors of the plurality of policies that correspond to the direction. 15 . A system as described in claim 13 , wherein the direction is a generalized natural policy gradient. 16 . A system as described in claim 13 , wherein the measure of performance is computed through use of reinforcement learning and concentration inequalities on deployment data generated by the one or more deployed policies. 17 . A system as described in claim 13 , wherein the selecting includes searching the plurality of policies as constrained to line searches of the high-dimensional vectors of the plurality of policies that correspond to the direction. 18 . A system as described in claim 13 , wherein the measure of performance is computed through use of reinforcement learning and concentration inequalities on deployment data generated by the one or more deployed policies. 19 . A content provider comprising one or more computing devices configured to perform operations including: deploying a policy to select advertisements to be included with content based on one or more characteristics associated with a request for the content; and replacing the deployed policy with another policy that is generated through: iteratively adjusting a high-dimensional vector that expresses the other policy; computing a direction in a policy space of a plurality of said other policies that is expected to point towards a region that is expected to be safe as including the said other policies that have a measure of performance that is greater than a threshold measure of performance and within a defined level of confidence; and selecting the other policy for deployment responsive to a determination that the adjusted high-dimensional vector of the other policy corresponds to the direction and that exhibits the measure of performance that is greater than a threshold measure of performance and within a defined level of confidence. 20 . A content provider as described in claim 19 , wherein the measure of performance is computed through use of reinforcement learning and concentration inequalities on deployment data generated by the one or more deployed policies.

Assignees

Adobe Systems Inc

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06Q30/0247
Calculate past, present or future revenues · CPC title
G06Q30/0242Primary
Determining effectiveness of advertisements · CPC title
G06N3/006
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
G06Q30/02Primary
Marketing; Price estimation or determination; Fundraising · CPC title

Patent family

Related publications grouped by family.

View patent family 54064698

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016148246A1 cover?: Risk quantification, policy search, and automated safe policy deployment techniques are described. In one or more implementations, techniques are utilized to determine safety of a policy, such as to express a level of confidence that a new policy will exhibit an increased measure of performance (e.g., interactions or conversions) over a currently deployed policy. In order to make this determina…
Who is the assignee on this patent?: Adobe Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06Q30/0242. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu May 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).