Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Nov 29 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Cumulative success-based recommendations for repeat users

US2018342004A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2018342004-A1
Application number	US-201715605525-A
Country	US
Kind code	A1
Filing date	May 25, 2017
Priority date	May 25, 2017
Publication date	Nov 29, 2018
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computerized systems and methods are provided for determining cumulative success-based recommendations for repeat users. One such method includes determining user and item latent-features based on matrix factorization applied to matrices that include recommendation and feedback events. The feedback events indicate previously provided user preferences for at least a portion of the items. An item-recommendation policy is determined based on a cumulative metric that includes an expected value for the accumulation of stochastic user-item rewards associated with future (or subsequent) recommendations. The accumulation of the rewards is based on the user latent-features, the item latent-features, and the previous rewards included in the feedback events. Machine learning, such as reinforcement learning (RL), is employed to determine the item-recommendation policy based on the feedback events. A recommendation is provided to a user based on the determined recommendation policy, the user latent-features for the user, and the item latent-features of the recommended items.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computerized system comprising: one or more processors; and computer storage memory having computer-executable instructions stored thereon which, when executed by the one or more processors, implement a method comprising: automatically determining user features for each of a plurality of users based on a plurality of feedback events, wherein each of the plurality of feedback events includes a previous user-item reward that indicates a preference previously provided by one of the plurality of users for one of a plurality of items; automatically determining item features for each of the plurality of items based on the plurality of feedback events; determining one or more recommendation policies based on a cumulative metric that includes an expected value for an accumulation of a plurality of stochastic user-item rewards associated with a plurality of subsequent recommendations, wherein the accumulation of the plurality of stochastic user-item rewards is based on the user features, the item features, and the previous user-item rewards included in the plurality of feedback events; and providing a first user of the plurality of users a first recommendation that includes an indication of at least a first item of the plurality of items, wherein the first recommendation is based on the one or more recommendation policies, the user features for the first user, and the item features for the first item. 2 . The system of claim 1 , wherein the method further comprising: generating a user-item matrix that includes at least a portion of the previous user-item rewards; determining a first matrix and a second matrix, wherein the first matrix is a first factor if the user-item matrix and the second matrix is a second factor of the user-item matrix; determining the user features for each of the plurality of users based on the first matrix; and determining the item features for each of the plurality of items based on the second matrix. 3 . The system of claim 1 , wherein the method further comprises: generating an ordered set of action vectors based on recommendations that were previously provided to the first user, wherein each of the action vectors are based on item features for a portion of the plurality of items that is indicated in the recommendations; generating an ordered set of the previous user-item rewards that includes previous user-item rewards that were previously provided by the first user and in response to the recommendations that were previously provided to the first user; generating an ordered set of state vectors based on the user features of the first user, the ordered set of action vectors, and the ordered set of the previous user-item rewards; generating a reinforcement-learning model based on the ordered set of action vectors, the ordered set of the previous user-item rewards, and the ordered set of state vectors; and determining the one or more recommendation policies based on the reinforcement-learning model. 4 . The system of claim 3 , wherein the method further comprises: generating a history for the first user based on a combination of the ordered set of action vectors and the ordered set of the previous user-item rewards; and generating the ordered set of state vectors based on the history for the first user. 5 . The system of claim 1 , wherein the cumulative metric includes a discount parameter that reduces the accumulation of the stochastic user-item rewards based on temporal distance for each of the stochastic user-item rewards. 6 . The system of claim 1 , wherein determining the one or more recommendation policies is based on a Markov Decision Process (MDP) that increases the cumulative metric. 7 . The system of claim 6 , wherein an action space of the MDP is based on the item features for each of the plurality of items and a state space of the MDP is based on the user features of each of the plurality of users. 8 . The system of claim 1 , wherein the method further comprises: generating a one-to-one association between the plurality of feedback events and a plurality of recommendation events, wherein a first feedback event of the plurality of feedback events is in response to a first recommendation event of the plurality of recommendation events that is associated with the first feedback event; and determining each of the plurality of stochastic user-item rewards associated with the plurality of subsequent recommendations based on the one-to-one association between the plurality of feedback events and the plurality of recommendation events. 9 . The system of claim 1 , wherein the item features of each of the plurality of items are item latent-features and the user features of each of the plurality of users are user latent-features that indicate a user's preference for the item latent-features of each of the plurality of items. 10 . A computerized system comprising: one or more processors; and computer storage memory having computer-executable instructions stored thereon which, when executed by the one or more processors, implement a method comprising: determining a current cumulative value that is associated with a current recommendation policy based on an on-policy analysis of user-item data that includes a plurality of previous recommendation events and a plurality of associated feedback events for a plurality of users and a plurality of items; determining an action-value function based on state-action pairs based on the user-item data; generating an updated recommendation policy based on the action-value function and an off-policy analysis of the user-item data; generating a comparison of the current cumulative value and an updated cumulative value that is associated with the updated recommendation policy and the off-policy analysis of the user-item data; and in response to the comparison of the current cumulative value and the updated cumulative value, deploying the updated recommendation policy. 11 . The system of claim 10 , wherein the method further comprises: generating a state vector for each state of each of the state-action pairs; generating an action vector for each action of each of the state-action pairs; generating a state-action vector for each state-action pair based on a combination of a corresponding state vector and a corresponding action vector; and generating the action-value function based on the state-action vectors. 12 . The system of claim 11 , wherein the current recommendation policy is based on the state-action vectors and a first weighting vector and the updated recommendation policy is based on the state-action vectors and a second weighting vector. 13 . A method for recommending items, comprising: aggregating user-item data that includes a plurality of recommendation data structures (DSs) and a plurality of feedback DSs, wherein each of the plurality of recommendation DSs encodes a previous recommendation, which was provided to one of a plurality of users, for at least one of a plurality of items, and wherein each of the plurality of feedback DSs encodes a corresponding preference of the one of the plurality of users for the at least the one of the plurality of items; generating a plurality of user DSs based on the plurality of feedback DSs, wherein each of the plurality of user DSs encodes user latent-features of one of the plurality of users; generating a plurality of item DSs based on the plurality of feedback DSs, wherein each of the plurality of item DSs encodes item latent-features of one of the plurality of items DSs; generating a decision-process DS based on the plurality of recommendation DSs, the plurality o

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06Q30/0217
involving input on products or services in exchange for incentives or rewards · CPC title
G06Q30/0631Primary
Recommending goods or services · CPC title
G06Q30/0224
based on user history · CPC title
G06N7/08
using chaos models or non-linear system models · CPC title

Patent family

Related publications grouped by family.

View patent family 64401645

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018342004A1 cover?: Computerized systems and methods are provided for determining cumulative success-based recommendations for repeat users. One such method includes determining user and item latent-features based on matrix factorization applied to matrices that include recommendation and feedback events. The feedback events indicate previously provided user preferences for at least a portion of the items. An item…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Nov 29 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).