Adaptive rewarding for content personalization

US11367120B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11367120-B2
Application numberUS-202016834815-A
CountryUS
Kind codeB2
Filing dateMar 30, 2020
Priority dateNov 8, 2019
Publication dateJun 21, 2022
Grant dateJun 21, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Business goals may be achieved using adaptive rewarding for the personalization of contents. In response to receiving user information, personalized contents for the user can be recommended using a reinforcement learning algorithm. In response to presenting the personalized content to the user, an action by the user selecting a particular content may be received. A reward value can be calculated for the action based on a reward function. The reward function can be based, at least in part, upon the action, the selected content, and/or the user. The reward function can be based upon one or more business goals, such as user engagement, monetization, and/or security. The calculated reward value can be provided to the reinforcement learning algorithm, which can be adapted based upon the reward value for future selection of personalized contents.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-readable storage medium storing instructions which, when executed by a hardware processor, cause the hardware processor to: receive content information regarding available contents; receive user information regarding a user; input content vectors based on the content information and a user vector based on the user information to a reinforcement learning model; recommend a set of personalized contents for the user from the available contents, the set of personalized contents being output by the reinforcement learning model; receive a user action in response to a presentation of the set of personalized contents to the user; train the reinforcement learning model by: calculating a reward value for the user action based on a reward function that includes a monetization term and an engagement term, the monetization term including a monetization tuning parameter that is manually set as a weight for targeting a monetization business goal, the engagement term including an engagement tuning parameter that is manually set as a weight for targeting an engagement business goal; and adapting the reinforcement learning model using the reward value to increase a probability of future occurrences of the user action that help achieve the monetization business goal and the engagement business goal. 2. The computer-readable storage medium of claim 1 , wherein the instructions further cause the hardware processor to: generate the content vectors associated with the available contents based on the content information; and generate the user vector associated with the user based on the user information, wherein the set of personalized contents is selected by the reinforcement learning model based on the content vectors and the user vector. 3. The computer-readable storage medium of claim 1 , wherein the reinforcement learning model selects the set of personalized contents that maximize the reward value. 4. The computer-readable storage medium of claim 1 , wherein the reinforcement learning model selects the set of personalized contents based at least on randomness. 5. The computer-readable storage medium of claim 1 , wherein the reinforcement learning model uses a contextual bandit algorithm to select the set of personalized contents. 6. A system, comprising: a hardware processor; and storage having instructions which, when executed by the hardware processor, cause the hardware processor to: receive game information regarding available games; generate game vectors associated with the available games based on the game information; receive user information regarding a user; generate a user vector associated with the user based on the user information; input the game vectors and the user vector to a machine learning model; recommend a personalized set of games for the user from the available games, the personalized set of games being output by the machine learning model; receive a user action associated with a selected game from the personalized set of games; calculate a reward value for the user action using a reward function that includes terms associated with business goals, the terms having tuning parameters that are manually set as weights for targeting the associated business goals; and train the machine learning model using the reward value as feedback to improve future recommendations of the available games that promote the business goals. 7. The system of claim 6 , wherein the machine learning model uses a reinforcement learning algorithm to select the personalized set of games. 8. The system of claim 6 , wherein the personalized set of games includes ranking. 9. The system of claim 8 , wherein the personalized set of games is displayed using heterogenous sizes that depend on the ranking of the personalized set of games. 10. The system of claim 8 , wherein the personalized set of games is displaying using heterogenous levels of interaction that depends on the ranking of the personalized set of games. 11. A method, comprising: receiving user information about a user; inputting a user vector based on the user information to a reinforcement learning model; recommending personalized contents for the user from available contents, the personalized contents being output by the reinforcement learning model; receiving an action relating to a selected content from the personalized contents; calculating a reward value for the action by using a reward function that includes terms associated with goals and tuning parameters associated with the terms, the tuning parameters being manually set as weights for targeting the goals; and training the reinforcement learning model using the reward value as feedback to select future personalized contents that further the goals. 12. The method of claim 11 , further comprising: monitoring content features associated with the available contents including the selected content; and automatically adjusting the reward function based on a particular monitored content feature associated with the selected content. 13. The method of claim 12 , wherein the reward function is based on an average of a particular monitored feature associated with the available contents. 14. The method of claim 11 , further comprising: monitoring user features associated with the user; and automatically adjusting the reward function based on a particular monitored user feature associated with the user. 15. The method of claim 11 , wherein the goals include one or more of monetization, engagement, inclusiveness, safety, or toxicity. 16. The method of claim 11 , wherein the tuning parameters are automatically adjusted based on time using a machine learning model. 17. The method of claim 11 , wherein the reward function is based on a probability of the user who performed the action will perform a subsequent action. 18. The method of claim 17 , wherein the subsequent action includes one or more of purchasing the selected content, purchasing another content, or playing the selected content. 19. The method of claim 11 , wherein training the reinforcement learning model includes adapting the reinforcement learning model to increase an occurrence of the action. 20. The method of claim 11 , wherein the reward function includes one or more of: an estimated value of the action for a particular content, a probability of the action converting to a particular goal, a utility of the particular goal for the particular content, or an average utility of the particular goal for the available contents.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Recommending goods or services · CPC title

  • Filtering based on additional data, e.g. user or group profiles · CPC title

  • using advertising information · CPC title

  • using advertisements · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11367120B2 cover?
Business goals may be achieved using adaptive rewarding for the personalization of contents. In response to receiving user information, personalized contents for the user can be recommended using a reinforcement learning algorithm. In response to presenting the personalized content to the user, an action by the user selecting a particular content may be received. A reward value can be calculate…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).