System and method for a recommender
US-2021081758-A1 · Mar 18, 2021 · US
US2022012565A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022012565-A1 |
| Application number | US-202117320439-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 14, 2021 |
| Priority date | May 15, 2020 |
| Publication date | Jan 13, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A reinforcement learning ranker can take into account previously-recommended media content items to produce a ranked list of media content items to recommend next. The ranker finds a policy that gives the probability of sampling a media content item given a state. The policy is learned such that it maximizes a reward. A reward function associated with the media content item can be defined with respect to whether the user finds the media content item relevant (likelihood that the user will like the media content item) and a diversity score of the media content item.
Opening claim text (preview).
What is claimed is: 1 . A method for selecting a media content item, the method comprising: obtaining data describing feedback from previous content consumption sessions of a user account; obtaining data regarding media content items previously recommended during a current content consumption session of the user account; generating a score for a potential media content item with a reinforcement learning model based on: the data regarding media content items previously recommended during the current content consumption session of the user account; and the data describing feedback from the previous playback sessions of the user account; and selecting, for the user account, the potential media content item based on the score, wherein the reinforcement learning model applies a reward function that takes into account relevance and diversity. 2 . The method of claim 1 , wherein the potential media content item is a potential track. 3 . The method of claim 1 , wherein the data describing feedback from previous playback sessions of the user account comprises a feedback aware embedding. 4 . The method of claim 3 , further comprising calculating the feedback aware embedding with a feedback aware embedder based on a meta feature, a media content item, and a dynamic user embedding. 5 . The method of claim 4 , further comprising calculating the dynamic user embedding with a dynamic user embedder based on representations of prior sessions. 6 . The method of claim 1 , wherein generating the score for the potential media content item includes applying a stacked LSTM initialed based on a session meta feature. 7 . The method of claim 1 , wherein the reward function includes the calculation: R ( t, s )= r ( t, u )− c+αd ( t, u )× r ( t, u ), where R(t, s) is a reward function for a given media content item t and session s; where r(t, u) is a reward function for the given media content item t and user u; where c is a value configured to ensure a negative reward for non-relevant media content items; where α is a weighting parameter; and where d(t, u) is a diversity function for a given media content item t and user u. 8 . The method of claim 1 , further comprising: calculating the diversity of the potential media content item based on a popularity of the potential media content item. 9 . The method of claim 1 , further comprising: calculating the diversity of the potential media content item based on a similarity of the potential media content item to other media content items played by the user account. 10 . A non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to: obtain data describing feedback from previous content consumption sessions of a user account; obtain data regarding media content items previously recommended during a current content consumption session of the user account; generate a score for a potential media content item with a reinforcement learning model based on: the data regarding media content items previously recommended during the current content consumption session of the user account; and the data describing feedback from the previous playback sessions of the user account; and select, for the user account, the potential media content item based on the score, wherein the reinforcement learning model applies a reward function that takes into account relevance and diversity. 11 . The non-transitory computer-readable medium of claim 10 , wherein the data describing feedback from previous playback sessions of the user account comprises a feedback aware embedding. 12 . The non-transitory computer-readable medium of claim 11 , wherein the instructions further cause the one or more processors to calculate the feedback aware embedding with a feedback aware embedder based on a meta feature, a media content item, and a dynamic user embedding. 13 . The non-transitory computer-readable medium of claim 12 , wherein the instructions further cause the one or more processors to calculate the dynamic user embedding with a dynamic user embedder based on representations of prior sessions. 14 . The non-transitory computer-readable medium of claim 10 , wherein to generate the score for the potential media content item includes to apply a stacked LSTM initialed based on a session meta feature. 15 . The non-transitory computer-readable medium of claim 10 , wherein the reward function includes the calculation: R ( t, s )= r ( t, u )− c+αd ( t, u )× r ( t, u ), where R(t, s) is a reward function for a given media content item t and session s; where r(t, u) is a reward function for the given media content item t and user u; where c is a value configured to ensure a negative reward for non-relevant media content items; where α is a weighting parameter; and where d(t, u) is a diversity function for a given media content item t and user u. 16 . The non-transitory computer-readable medium of claim 10 , wherein the instructions further cause the one or more processors to calculate the diversity of the potential media content item based on a popularity of the potential media content item or based on a similarity of the potential media content item to other media content items played by the user account. 17 . A system comprising: a media-playback device; and a media-delivery system configured to: obtain data describing feedback from previous content consumption sessions of a user account; obtain data regarding media content items previously recommended during a current content consumption session of the user account; generate a score for a potential media content item with a reinforcement learning model based on: the data regarding media content items previously recommended during the current content consumption session of the user account; and the data describing feedback from the previous playback sessions of the user account; and select, for the user account, the potential media content item based on the score; and transmit the selected media content item to the media-playback device for playback, wherein the reinforcement learning model applies a reward function that takes into account relevance and diversity. 18 . The system of claim 17 , wherein to generate the score for the potential media content item includes to apply a stacked LSTM initialed based on a session meta feature. 19 . The system of claim 17 , wherein the reward function includes the calculation: R ( t, s )= r ( t, u )− c+αd ( t, u )× r ( t, u ), where R(t, s) is a reward function for a given media content item t and session s; where r(t, u) is a reward function for the given media content item t and user u; where c is a value configured to ensure a negative reward for non-relevant media content items; where α is a weighting parameter; and where d(t, u) is a diversity function for a given media content item t and user u. 20 . The system of claim 17 , wherein the media-delivery system is further configured to: calculate the diversity of the potential media content item based on a popularity of the potential media content item or based on a similarity of the potential media content item to other media content items played by the user account.
Recurrent networks, e.g. Hopfield networks · CPC title
Knowledge-based neural networks; Logical representations of neural networks · CPC title
Combinations of networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Reinforcement learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.