Methods for enhancing complete data extraction of dia data
US-2024428893-A1 · Dec 26, 2024 · US
US2020342268A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020342268-A1 |
| Application number | US-202016813654-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 9, 2020 |
| Priority date | Apr 29, 2019 |
| Publication date | Oct 29, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure is related to determining an item push list for a user based on a reinforcement learning model. In one aspect, a method includes obtaining M first item lists that have been predetermined for a first user. Each first item list includes i-1 items. For each first item list, an ith state feature vector is obtained. The ith state feature vector includes a static feature and a dynamic feature. The ith state feature vector is provided as input to the reinforcement machine learning model. The reinforcement model outputs a weight vector including weights of sorting features. A sorting feature vector of each item in a candidate item set corresponding to the first item list is obtained. The sorting feature vector includes feature values of sorting features. M updated item lists are determined for the first item lists based on a score for each item in M candidate item sets.
Opening claim text (preview).
1 . A computer-implemented method for determining updated item lists based on a reinforcement machine learning model, the method comprising: obtaining M first item lists that have been predetermined for a first user, wherein each first item list comprises i−1 items, M is an integer greater than or equal to two, and i is a predetermined integer N that is greater than one; for each first item list obtaining an ith state feature vector for an ith state of each first item list, wherein the ith state feature vector comprises a static feature and a dynamic feature, wherein the static feature comprises a user attribute feature of the first user and the dynamic feature comprises item attribute features of the i−1 items, respectively in the first item list, providing the ith state feature vector as input to the reinforcement machine learning model, wherein the reinforcement machine learning model outputs a weight vector corresponding to the ith state feature vector, and wherein the weight vector comprises weights of a predetermined quantity of sorting features, obtaining a sorting feature vector of each item in a candidate item set corresponding to the first item list, wherein the sorting feature vector comprises feature values of the predetermined quantity of sorting features, and calculating a score for each item in the candidate item set based on a dot product of the sorting feature vector of each item in the candidate item set and the weight vector; determining, using a beam search algorithm, M updated item lists for the first item lists based on the score for each item in M candidate item sets respectively corresponding to the first item lists, wherein each updated item list comprises i items determining an item push list for the first user from the M updated item lists using the beam search algorithm; pushing items in the item push list to the first user in an arrangement order to obtain feedback from the first user; obtaining N return values based on the arrangement order and the feedback, wherein the N return values respectively correspond to N iterations of pushing items in the item push list to the first user; obtaining an (N+1)th state feature vector, wherein the (N+1)th state feature vector comprises the static feature and an additional dynamic feature, wherein the additional dynamic feature comprises additional item attribute features of the items in the item push list and training the reinforcement machine learning model based on N groups of data respectively corresponding to the N iterations, wherein the N groups of data comprise a first group of data to an Nth group of data, and each ith group of data comprises the ith state feature vector corresponding to the item push list, a weight vector corresponding to the ith state feature vector, an (i+1)th state feature vector corresponding to the item push list, and a return value corresponding to an ith iteration of pushing items in the item push list to the first user. 2 . The computer-implemented method of claim 1 , wherein the item attribute features comprise, for each item in the first item list, (i) a current popularity of the item, (ii) an item identifier for the item, or (iii) an item type for the item. 3 . The computer-implemented method of claim 1 , wherein, for a particular first item list of the first item lists, the feature values of the predetermined quantity of sorting features comprise (i) an estimated click-through rate of the first user for a first item in a first candidate item set corresponding to the particular first item list, (ii) a current popularity of the first item, or (iii) a diversity of the first item relative to the items in the first item list. 4 . The computer-implemented method of claim 1 , wherein the first item lists comprise one item list that is predetermined, and wherein determining the updated item lists comprises: identifying, in the candidate item set corresponding to the one item list, a highest scoring item having a highest score among the items in the candidate set corresponding to the one item list; and including the highest scoring item as an ith item in the updated item list corresponding to the one item list. 5 . (canceled) 6 . (canceled) 7 . (canceled) 8 . A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising: obtaining M first item lists that have been predetermined for a first user, wherein each first item list comprises i−1 items, M is an integer greater than or equal to two, and i is a predetermined integer N that is greater than one; for each first item list obtaining an ith state feature vector for an ith state of each first item list, wherein the ith state feature vector comprises a static feature and a dynamic feature, wherein the static feature comprises a user attribute feature of the first user and the dynamic feature comprises item attribute features of the i−1 items, respectively in the first item list, providing the ith state feature vector as input to a reinforcement machine learning model, wherein the reinforcement machine learning model outputs a weight vector corresponding to the ith state feature vector, and wherein the weight vector comprises weights of a predetermined quantity of sorting features, obtaining a sorting feature vector of each item in a candidate item set corresponding to the first item list, wherein the sorting feature vector comprises feature values of the predetermined quantity of sorting features, and calculating a score for each item in the candidate item set based on a dot product of the sorting feature vector of each item in the candidate item set and the weight vector; determining, using a beam search algorithm, M updated item lists for the first item lists based on the score for each item in M candidate item sets respectively corresponding to the first item lists, wherein each updated item list comprises i items; determining an item push list for the first user from the M updated item lists using the beam search algorithm; pushing items in the item push list to the first user in an arrangement order to obtain feedback from the first user; obtaining N return values based on the arrangement order and the feedback, wherein the N return values respectively correspond to N iterations of pushing items in the item push list to the first user; obtaining an (N+1)th state feature vector, wherein the (N+1)th state feature vector comprises the static feature and an additional dynamic feature, wherein the additional dynamic feature comprises additional item attribute features of the items in the item push list and training the reinforcement machine learning model based on N groups of data respectively corresponding to the N iterations, wherein the N groups of data comprise a first group of data to an Nth group of data, and each ith group of data comprises the ith state feature vector corresponding to the item push list, a weight vector corresponding to the ith state feature vector, an (i+1)th state feature vector corresponding to the item push list, and a return value corresponding to an ith iteration of pushing items in the item push list to the first user. 9 . The non-transitory, computer-readable medium of claim 8 , wherein the item attribute features comprise, for each item in the first item list, (i) a current popularity of the item, (ii) an item identifier for the item, or (iii) an item type for the item. 10 . The non-transitory, computer-readable medium of claim 8 , wherein, for a particular first item list of the first item lists, the feature values of the predetermined quantity of sorting features comprise (i) an estimated click-through rate of the first user for a fi
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
based on feedback of a supervisor · CPC title
using statistics or function optimisation, e.g. modelling of probability density functions · CPC title
Reinforcement learning · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.