Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Adaptive rewarding for content personalization

US11367120B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11367120-B2
Application number	US-202016834815-A
Country	US
Kind code	B2
Filing date	Mar 30, 2020
Priority date	Nov 8, 2019
Publication date	Jun 21, 2022
Grant date	Jun 21, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Business goals may be achieved using adaptive rewarding for the personalization of contents. In response to receiving user information, personalized contents for the user can be recommended using a reinforcement learning algorithm. In response to presenting the personalized content to the user, an action by the user selecting a particular content may be received. A reward value can be calculated for the action based on a reward function. The reward function can be based, at least in part, upon the action, the selected content, and/or the user. The reward function can be based upon one or more business goals, such as user engagement, monetization, and/or security. The calculated reward value can be provided to the reinforcement learning algorithm, which can be adapted based upon the reward value for future selection of personalized contents.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-readable storage medium storing instructions which, when executed by a hardware processor, cause the hardware processor to: receive content information regarding available contents; receive user information regarding a user; input content vectors based on the content information and a user vector based on the user information to a reinforcement learning model; recommend a set of personalized contents for the user from the available contents, the set of personalized contents being output by the reinforcement learning model; receive a user action in response to a presentation of the set of personalized contents to the user; train the reinforcement learning model by: calculating a reward value for the user action based on a reward function that includes a monetization term and an engagement term, the monetization term including a monetization tuning parameter that is manually set as a weight for targeting a monetization business goal, the engagement term including an engagement tuning parameter that is manually set as a weight for targeting an engagement business goal; and adapting the reinforcement learning model using the reward value to increase a probability of future occurrences of the user action that help achieve the monetization business goal and the engagement business goal. 2. The computer-readable storage medium of claim 1 , wherein the instructions further cause the hardware processor to: generate the content vectors associated with the available contents based on the content information; and generate the user vector associated with the user based on the user information, wherein the set of personalized contents is selected by the reinforcement learning model based on the content vectors and the user vector. 3. The computer-readable storage medium of claim 1 , wherein the reinforcement learning model selects the set of personalized contents that maximize the reward value. 4. The computer-readable storage medium of claim 1 , wherein the reinforcement learning model selects the set of personalized contents based at least on randomness. 5. The computer-readable storage medium of claim 1 , wherein the reinforcement learning model uses a contextual bandit algorithm to select the set of personalized contents. 6. A system, comprising: a hardware processor; and storage having instructions which, when executed by the hardware processor, cause the hardware processor to: receive game information regarding available games; generate game vectors associated with the available games based on the game information; receive user information regarding a user; generate a user vector associated with the user based on the user information; input the game vectors and the user vector to a machine learning model; recommend a personalized set of games for the user from the available games, the personalized set of games being output by the machine learning model; receive a user action associated with a selected game from the personalized set of games; calculate a reward value for the user action using a reward function that includes terms associated with business goals, the terms having tuning parameters that are manually set as weights for targeting the associated business goals; and train the machine learning model using the reward value as feedback to improve future recommendations of the available games that promote the business goals. 7. The system of claim 6 , wherein the machine learning model uses a reinforcement learning algorithm to select the personalized set of games. 8. The system of claim 6 , wherein the personalized set of games includes ranking. 9. The system of claim 8 , wherein the personalized set of games is displayed using heterogenous sizes that depend on the ranking of the personalized set of games. 10. The system of claim 8 , wherein the personalized set of games is displaying using heterogenous levels of interaction that depends on the ranking of the personalized set of games. 11. A method, comprising: receiving user information about a user; inputting a user vector based on the user information to a reinforcement learning model; recommending personalized contents for the user from available contents, the personalized contents being output by the reinforcement learning model; receiving an action relating to a selected content from the personalized contents; calculating a reward value for the action by using a reward function that includes terms associated with goals and tuning parameters associated with the terms, the tuning parameters being manually set as weights for targeting the goals; and training the reinforcement learning model using the reward value as feedback to select future personalized contents that further the goals. 12. The method of claim 11 , further comprising: monitoring content features associated with the available contents including the selected content; and automatically adjusting the reward function based on a particular monitored content feature associated with the selected content. 13. The method of claim 12 , wherein the reward function is based on an average of a particular monitored feature associated with the available contents. 14. The method of claim 11 , further comprising: monitoring user features associated with the user; and automatically adjusting the reward function based on a particular monitored user feature associated with the user. 15. The method of claim 11 , wherein the goals include one or more of monetization, engagement, inclusiveness, safety, or toxicity. 16. The method of claim 11 , wherein the tuning parameters are automatically adjusted based on time using a machine learning model. 17. The method of claim 11 , wherein the reward function is based on a probability of the user who performed the action will perform a subsequent action. 18. The method of claim 17 , wherein the subsequent action includes one or more of purchasing the selected content, purchasing another content, or playing the selected content. 19. The method of claim 11 , wherein training the reinforcement learning model includes adapting the reinforcement learning model to increase an occurrence of the action. 20. The method of claim 11 , wherein the reward function includes one or more of: an estimated value of the action for a particular content, a probability of the action converting to a particular goal, a utility of the particular goal for the particular content, or an average utility of the particular goal for the available contents.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06Q30/0631Primary
Recommending goods or services · CPC title
G06F16/435
Filtering based on additional data, e.g. user or group profiles · CPC title
A63F13/61
using advertising information · CPC title
A63F2300/5506
using advertisements · CPC title

Patent family

Related publications grouped by family.

View patent family 75847566

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11367120B2 cover?: Business goals may be achieved using adaptive rewarding for the personalization of contents. In response to receiving user information, personalized contents for the user can be recommended using a reinforcement learning algorithm. In response to presenting the personalized content to the user, an action by the user selecting a particular content may be received. A reward value can be calculate…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generating and providing personalized digital content in real time based on live user context

Cumulative success-based recommendations for repeat users

Systems and methods for selecting third party content based on feedback

Method and System for Enhanced Content Recommendation

Client terminal, display control method, program, and system

Frequently asked questions