Utilizing additive decomposition for universal off-policy evaluation of digital content slate recommendations

US12462290B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12462290-B2
Application numberUS-202318530907-A
CountryUS
Kind codeB2
Filing dateDec 6, 2023
Priority dateDec 6, 2023
Publication dateNov 4, 2025
Grant dateNov 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, non-transitory computer-readable media, and methods for performing off-policy evaluations of slate recommendation policies through additive decomposition. In particular, in one or more embodiments, the disclosed systems receive historical data corresponding to digital slate recommendations performed by a first slate recommendation policy, with each slate recommendation comprising a plurality of digital slot recommendations. Additionally, in some embodiments, the disclosed systems generate a second slate action using a second slate recommendation policy conditioned on user context. Further, in some embodiments, the disclosed systems generate a plurality of importance weights by summing a plurality of slot-level density ratios generated by comparing the slate actions of the second slate recommendation policy to the slate actions of the first slate recommendation policy. In some embodiments, the disclosed systems apply the plurality of importance weights to generate a predicted reward distribution for evaluation of the second slate recommendation policy.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: receiving historical slate data comprising observed rewards from selecting slate actions for a plurality of digital slots of a digital slate utilizing a first slate recommendation policy; generating, for a second slate recommendation policy, a plurality of importance weights from the historical slate data by summing slot-level density ratios between the first slate recommendation policy and the second slate recommendation policy for the slate actions; and generating a predicted reward distribution for the second slate recommendation policy by applying the plurality of importance weights to the historical slate data for the first slate recommendation policy. 2 . The computer-implemented method of claim 1 , wherein generating the plurality of importance weights comprises: determining, for a slate action, a first slot-level density ratio between the first slate recommendation policy and the second slate recommendation policy for a first slot of the digital slate; determining, for the slate action, a second slot-level density ratio between the first slate recommendation policy and the second slate recommendation policy for a second slot of the digital slate; and summing the first slot-level density ratio and the second slot-level density ratio to determine an importance weight for the slate action. 3 . The computer-implemented method of claim 2 , wherein generating the plurality of importance weights comprises summing, for an additional slate action, slot-level density ratios for the plurality of digital slots to generate an additional importance weight. 4 . The computer-implemented method of claim 1 , wherein generating the slot-level density ratios comprises: determining a first slot-level probability of selecting a first slot-level action utilizing the first slate recommendation policy; and determining a second slot-level probability of selecting the first slot-level action utilizing the second slate recommendation policy. 5 . The computer-implemented method of claim 4 , wherein generating the slot-level density ratios comprises generating a first slot-level density ratio from the first slot-level probability and the second slot-level probability. 6 . The computer-implemented method of claim 4 , wherein receiving the historical slate data further comprises receiving a client device context analyzed by the first slate recommendation policy in selecting the slate actions, and further comprising: determining, from the historical slate data, a client device context embedding from a plurality of client device context embeddings utilized to select the first slot-level action; and determining the second slot-level probability of selecting the first slot-level action utilizing the second slate recommendation policy in light of the client device context embedding. 7 . The computer-implemented method of claim 6 , further comprising determining the first slot-level probability of selecting the first slot-level action utilizing the first slate recommendation policy in light of the client device context. 8 . The computer-implemented method of claim 7 , wherein generating the predicted reward distribution for the second slate recommendation policy comprises generating a cumulative distribution function by applying the plurality of importance weights to the observed rewards from the historical slate data. 9 . A system comprising: one or more memory devices comprising historical slate data comprising observed rewards from selecting slate actions for a plurality of digital slots of a digital slate utilizing a first digital policy; and one or more processors configured to cause the system to: generate, for a second digital policy, a plurality of importance weights from the historical slate data by: summing, for a first slate action, a first plurality of slot-level density ratios for the plurality of digital slots to generate a first importance weight; and summing, for a second slate action, a second plurality of slot-level density ratios for the plurality of digital slots to generate a second importance weight; and generate a predicted reward distribution for the second digital policy by applying the plurality of importance weights to the historical slate data. 10 . The system of claim 9 , wherein the one or more processors are configured to cause the system to generate the first plurality of slot-level density ratios by: determining, for the first slate action, a first slot-level probability of selecting a first slot-level action for a first slot utilizing the first digital policy; determining, for the first slate action, a second slot-level probability of selecting the first slot-level action for the first slot utilizing the second digital policy; and generating a first slot-level density ratio from the first slot-level probability and the second slot-level probability. 11 . The system of claim 10 , wherein the one or more processors are configured to cause the system to generate the first plurality of slot-level density ratios by: determining, for the first slate action, a third slot-level probability of selecting a second slot-level action for a second slot utilizing the first digital policy; and determining, for the first slate action, a fourth slot-level probability of selecting the second slot-level action for the second slot utilizing the second digital policy. 12 . The system of claim 11 , wherein the one or more processors are configured to cause the system to generate the first plurality of slot-level density ratios by generating a second slot-level density ratio from the third slot-level probability and the fourth slot-level probability. 13 . The system of claim 12 , wherein the one or more processors are configured to cause the system to generate the first importance weight for the first slate action by summing the first slot-level density ratio and the second slot-level density ratio. 14 . The system of claim 10 , wherein the historical slate data comprises client device context data analyzed by the first digital policy in selecting the slate actions and wherein the one or more processors are further configured to cause the system to determine the first slot-level probability of selecting the first slot-level action utilizing the first digital policy in light of a first client device context from the client device context data. 15 . The system of claim 9 , wherein the one or more processors are configured to cause the system to generate the predicted reward distribution for the second digital policy by generating a cumulative distribution function from the plurality of importance weights and the observed rewards from the historical slate data. 16 . A non-transitory computer readable medium storing executable instructions which, when executed by a processing device, cause the processing device to perform operations comprising: receiving historical slate data comprising observed rewards from selecting slate actions for a plurality of slots of a digital slate utilizing a first digital policy; generating, for a second digital policy, a plurality of importance weights from the historical slate data corresponding to the first digital policy by: determining, for a slate action, a first slot-level density ratio between the first digital policy and the second digital policy for a first slot of the digital slate; determining, for the slate action, a second slot-level density ratio between the first digital policy and the second digital policy for a second slot of the digital slate; and sum

Assignees

Inventors

Classifications

  • Recommending goods or services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12462290B2 cover?
The present disclosure relates to systems, non-transitory computer-readable media, and methods for performing off-policy evaluations of slate recommendation policies through additive decomposition. In particular, in one or more embodiments, the disclosed systems receive historical data corresponding to digital slate recommendations performed by a first slate recommendation policy, with each sla…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).