Assessing accuracy of a machine learning model
US-2019311287-A1 · Oct 10, 2019 · US
US11645580B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11645580-B2 |
| Application number | US-202016748313-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 21, 2020 |
| Priority date | Jan 21, 2020 |
| Publication date | May 9, 2023 |
| Grant date | May 9, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for content selection and presentation is disclosed. A system receives a plurality of content elements configured for presentation in at least one content container and selects one of the plurality of content elements for presentation in the at least one content container. The one of the plurality of content elements is selected by a trained selection model configured to use Thompson sampling. An interface including the selected one of the plurality of content elements is generated.
Opening claim text (preview).
What is claimed is: 1. A system for generating item recommendations, comprising: a memory having instructions stored thereon, and a processor that reads the instructions to: receive a plurality of content elements for presentation in at least one content container; select one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained selection model using Thompson sampling, wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R, wherein the short-term reward value, r, is an expected reward from a user in a current session and the long-term reward value, R, is an expected discounted rewards from the user in future sessions; generate an interface including the selected one of the plurality of content elements; provide the interface for display; monitor a state and an action taken through the interface within a time period T; determine a posterior distribution parameter of the state and action within the time period T; update long-term reward information based on the posterior distribution parameter of the state and action; and generate an updated selection model based on the updated long-term reward information. 2. The system of claim 1 , wherein the plurality of content elements are selected based on a received persona. 3. The system of claim 1 , wherein the trained selection model is trained using a plurality of prior impressions. 4. The system of claim 1 , wherein the trained selection model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. 5. The system of claim 1 , wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. 6. The system of claim 5 , wherein the machine learning model is a neural network. 7. A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor cause a device to perform operations comprising: receiving a request for an interface, wherein the request includes a user persona; selecting at least one of a plurality of content elements for inclusion in the interface, wherein the at least one of the plurality of content elements is selected using Thompson sampling, wherein a trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R, wherein the short-term reward value, r, is an expected reward from a user in a current session and the long-term reward value, R, is an expected discounted rewards from the user in future sessions; generating an interface including the selected at least one of the plurality of content elements; providing the interface for display; monitoring a state and an action taken through the interface within a time period T; determining a posterior distribution parameter of the state and action within the time period T; updating long-term reward information based on the posterior distribution parameter; and generating an undated selection model based on the updated long-term reward information. 8. The non-transitory computer readable medium of claim 7 , wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. 9. The non-transitory computer readable medium of claim 8 , wherein the machine learning model is trained using a plurality of prior impressions. 10. The non-transitory computer readable medium of claim 8 , wherein the machine learning model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. 11. The non-transitory computer readable medium of claim 8 , wherein the machine learning model is a neural network. 12. A computer-implemented method, comprising: receiving a plurality of content elements configured for presentation in at least one content container; selecting one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained selection model configured to use Thompson sampling, wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R, wherein the short-term reward value, r, is an expected reward from a user in a current session and the long-term reward value, R, is an expected discounted rewards from the user in future sessions; generating an interface including the selected one of the plurality of content elements; providing the interface for display; monitoring a state and an action taken through the interface within a time period T; determining a posterior distribution parameter of the state and action within the time Period T; updating long-term reward information based on the posterior distribution parameter; and generating an undated selection model based on the updated long-term reward information. 13. The computer-implemented method of claim 12 , wherein the plurality of content elements are selected based on a received persona. 14. The computer-implemented method of claim 12 , wherein the trained selection model is trained using a plurality of prior impressions. 15. The computer-implemented method of claim 12 , wherein the trained selection model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. 16. The computer-implemented method of claim 12 , wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. 17. The computer-implemented method of claim 16 , wherein the machine learning model is a neural network.
Reinforcement learning · CPC title
Learning methods · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Machine learning · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.