What technology area does this patent fall under?

Primary CPC classification G06N3/006. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Oct 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Pushing items to users based on a reinforcement learning model

US2020342268A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2020342268-A1
Application number	US-202016813654-A
Country	US
Kind code	A1
Filing date	Mar 9, 2020
Priority date	Apr 29, 2019
Publication date	Oct 29, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure is related to determining an item push list for a user based on a reinforcement learning model. In one aspect, a method includes obtaining M first item lists that have been predetermined for a first user. Each first item list includes i-1 items. For each first item list, an ith state feature vector is obtained. The ith state feature vector includes a static feature and a dynamic feature. The ith state feature vector is provided as input to the reinforcement machine learning model. The reinforcement model outputs a weight vector including weights of sorting features. A sorting feature vector of each item in a candidate item set corresponding to the first item list is obtained. The sorting feature vector includes feature values of sorting features. M updated item lists are determined for the first item lists based on a score for each item in M candidate item sets.

First claim

Opening claim text (preview).

1 . A computer-implemented method for determining updated item lists based on a reinforcement machine learning model, the method comprising: obtaining M first item lists that have been predetermined for a first user, wherein each first item list comprises i−1 items, M is an integer greater than or equal to two, and i is a predetermined integer N that is greater than one; for each first item list obtaining an ith state feature vector for an ith state of each first item list, wherein the ith state feature vector comprises a static feature and a dynamic feature, wherein the static feature comprises a user attribute feature of the first user and the dynamic feature comprises item attribute features of the i−1 items, respectively in the first item list, providing the ith state feature vector as input to the reinforcement machine learning model, wherein the reinforcement machine learning model outputs a weight vector corresponding to the ith state feature vector, and wherein the weight vector comprises weights of a predetermined quantity of sorting features, obtaining a sorting feature vector of each item in a candidate item set corresponding to the first item list, wherein the sorting feature vector comprises feature values of the predetermined quantity of sorting features, and calculating a score for each item in the candidate item set based on a dot product of the sorting feature vector of each item in the candidate item set and the weight vector; determining, using a beam search algorithm, M updated item lists for the first item lists based on the score for each item in M candidate item sets respectively corresponding to the first item lists, wherein each updated item list comprises i items determining an item push list for the first user from the M updated item lists using the beam search algorithm; pushing items in the item push list to the first user in an arrangement order to obtain feedback from the first user; obtaining N return values based on the arrangement order and the feedback, wherein the N return values respectively correspond to N iterations of pushing items in the item push list to the first user; obtaining an (N+1)th state feature vector, wherein the (N+1)th state feature vector comprises the static feature and an additional dynamic feature, wherein the additional dynamic feature comprises additional item attribute features of the items in the item push list and training the reinforcement machine learning model based on N groups of data respectively corresponding to the N iterations, wherein the N groups of data comprise a first group of data to an Nth group of data, and each ith group of data comprises the ith state feature vector corresponding to the item push list, a weight vector corresponding to the ith state feature vector, an (i+1)th state feature vector corresponding to the item push list, and a return value corresponding to an ith iteration of pushing items in the item push list to the first user. 2 . The computer-implemented method of claim 1 , wherein the item attribute features comprise, for each item in the first item list, (i) a current popularity of the item, (ii) an item identifier for the item, or (iii) an item type for the item. 3 . The computer-implemented method of claim 1 , wherein, for a particular first item list of the first item lists, the feature values of the predetermined quantity of sorting features comprise (i) an estimated click-through rate of the first user for a first item in a first candidate item set corresponding to the particular first item list, (ii) a current popularity of the first item, or (iii) a diversity of the first item relative to the items in the first item list. 4 . The computer-implemented method of claim 1 , wherein the first item lists comprise one item list that is predetermined, and wherein determining the updated item lists comprises: identifying, in the candidate item set corresponding to the one item list, a highest scoring item having a highest score among the items in the candidate set corresponding to the one item list; and including the highest scoring item as an ith item in the updated item list corresponding to the one item list. 5 . (canceled) 6 . (canceled) 7 . (canceled) 8 . A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising: obtaining M first item lists that have been predetermined for a first user, wherein each first item list comprises i−1 items, M is an integer greater than or equal to two, and i is a predetermined integer N that is greater than one; for each first item list obtaining an ith state feature vector for an ith state of each first item list, wherein the ith state feature vector comprises a static feature and a dynamic feature, wherein the static feature comprises a user attribute feature of the first user and the dynamic feature comprises item attribute features of the i−1 items, respectively in the first item list, providing the ith state feature vector as input to a reinforcement machine learning model, wherein the reinforcement machine learning model outputs a weight vector corresponding to the ith state feature vector, and wherein the weight vector comprises weights of a predetermined quantity of sorting features, obtaining a sorting feature vector of each item in a candidate item set corresponding to the first item list, wherein the sorting feature vector comprises feature values of the predetermined quantity of sorting features, and calculating a score for each item in the candidate item set based on a dot product of the sorting feature vector of each item in the candidate item set and the weight vector; determining, using a beam search algorithm, M updated item lists for the first item lists based on the score for each item in M candidate item sets respectively corresponding to the first item lists, wherein each updated item list comprises i items; determining an item push list for the first user from the M updated item lists using the beam search algorithm; pushing items in the item push list to the first user in an arrangement order to obtain feedback from the first user; obtaining N return values based on the arrangement order and the feedback, wherein the N return values respectively correspond to N iterations of pushing items in the item push list to the first user; obtaining an (N+1)th state feature vector, wherein the (N+1)th state feature vector comprises the static feature and an additional dynamic feature, wherein the additional dynamic feature comprises additional item attribute features of the items in the item push list and training the reinforcement machine learning model based on N groups of data respectively corresponding to the N iterations, wherein the N groups of data comprise a first group of data to an Nth group of data, and each ith group of data comprises the ith state feature vector corresponding to the item push list, a weight vector corresponding to the ith state feature vector, an (i+1)th state feature vector corresponding to the item push list, and a return value corresponding to an ith iteration of pushing items in the item push list to the first user. 9 . The non-transitory, computer-readable medium of claim 8 , wherein the item attribute features comprise, for each item in the first item list, (i) a current popularity of the item, (ii) an item identifier for the item, or (iii) an item type for the item. 10 . The non-transitory, computer-readable medium of claim 8 , wherein, for a particular first item list of the first item lists, the feature values of the predetermined quantity of sorting features comprise (i) an estimated click-through rate of the first user for a fi

Assignees

Alibaba Group Holding Ltd

Inventors

Classifications

G06N3/006Primary
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
G06F18/2178
based on feedback of a supervisor · CPC title
G06F18/2321
using statistics or function optimisation, e.g. modelling of probability density functions · CPC title
G06N3/092
Reinforcement learning · CPC title
G06F9/30036
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

View patent family 72922051

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020342268A1 cover?: This disclosure is related to determining an item push list for a user based on a reinforcement learning model. In one aspect, a method includes obtaining M first item lists that have been predetermined for a first user. Each first item list includes i-1 items. For each first item list, an ith state feature vector is obtained. The ith state feature vector includes a static feature and a dynamic…
Who is the assignee on this patent?: Alibaba Group Holding Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/006. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Oct 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).