Dynamical switching between long-term and short-term rewards

US11314529B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11314529-B2
Application numberUS-202016748452-A
CountryUS
Kind codeB2
Filing dateJan 21, 2020
Priority dateJan 21, 2020
Publication dateApr 26, 2022
Grant dateApr 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for content selection and presentation is disclosed. A plurality of content elements configured for presentation in at least one content container is received and one of the plurality of content elements is selected for presentation in the at least one content container. The one of the plurality of content elements is selected by a trained selection model based on an optimal impression allocation. An interface is generated that includes the selected one of the plurality of content elements.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for content selection and presentation, comprising: a memory having instructions stored thereon, and a processor-configured to-read the instructions to: receive a plurality of content elements configured for presentation in at least one content container; select one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained selection model based on an optimal impression allocation, wherein the optimal impression allocation is selected using testing data used to compare calculated reward values, wherein the optimal impression allocation is configured to balance a short-term reward value and a long-term reward value of each of the plurality of content elements, wherein the short-term reward value indicates immediate rewards, and wherein the long-term reward value indicates a user return rate and is calculated as a sum of discounted short term rewards; and generate an interface including the one of the plurality of content elements selected for presentation. 2. The system of claim 1 , wherein the long-term reward value is determined by a Markov Decision Process (S,C,P,R,γ), where S represents a state space, C represents a content space, P represents a transition function, and R represents the immediate reward function. 3. The system of claim 1 , wherein the short-term reward value is determined based on Thompson sampling of a posterior distribution reward function. 4. The system of claim 1 , wherein the optimal impression allocation includes an estimated impression allocation generated according to an equation: ∑ C i ⁢ I ^ S i , C i × R ^ ( i , Test ) ⁡ ( S i , C i ) where C i is the content element, I is an impression value, S i is a state, and R is a reward function. 5. The system of claim 4 , where the impression value I is calculated as: I ^ S i , C i = w ⁡ ( R ^ i , Train ) ⁡ ( S i , C i ) ) ∑ C j ⁢ w ⁡ ( R ^ i , Train ) ⁡ ( S i , C j ) ) . 6. The system of claim 1 , wherein the trained selection model includes a plurality of impression allocations, and wherein the optimal impression allocation is selected from the plurality of impression allocations based on one or more predetermined selection criteria. 7. The system of claim 1 , wherein the long-term reward value is determined by a Markov Decision Process (S,C,P,R,γ), where S represents a state space, C represents a content space, P represents a transition function, and R represents the immediate reward function and the short-term reward value is determined based on Thompson sampling of a posterior distribution reward function. 8. A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor cause a device to perform operations comprising: receiving a plurality of content elements configured for presentation in at least one content container; selecting one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained selection model based on an optimal impression allocation, wherein the optimal impression allocation is configured to balance a short-term reward value and a long-term reward value of each of the plurality of content elements, wher

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • G06F9/451Primary

    Execution arrangements for user interfaces · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11314529B2 cover?
A system and method for content selection and presentation is disclosed. A plurality of content elements configured for presentation in at least one content container is received and one of the plurality of content elements is selected for presentation in the at least one content container. The one of the plurality of content elements is selected by a trained selection model based on an optimal…
Who is the assignee on this patent?
Walmart Apollo Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).