Recommendation system using linear stochastic bandits and confidence interval generation
US-11100559-B2 · Aug 24, 2021 · US
US12443983B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12443983-B2 |
| Application number | US-202217581635-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 21, 2022 |
| Priority date | Jan 21, 2022 |
| Publication date | Oct 14, 2025 |
| Grant date | Oct 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods including one or more processors and one or more non-transitory computer readable media storing computing instructions that, when executed on the one or more processors, cause the one or more processors to perform: receiving a user request via a graphical user interface, the user request corresponding to a user search query for a product; determining whether a first processing machine of the system is operating in a first processing mode or a second processing mode; when the first processing machine is determined to be operating in the first processing mode, analyzing the user request via the first processing machine and using a process, to identify a candidate recommendation system to utilize by: determining a randomized strategy for one or more candidate recommendation systems based on a ratio of a number of the one or more candidate recommender systems, the randomized strategy to be stored in a collected history data; determining model parameters based on the collected history data; and determining the candidate recommendation system from the one or more candidate recommendation systems as a candidate recommendation system with a maximum value for a reward model based on the user request; processing the user request with the candidate recommendation system to identify recommended products to display to the user; and transmitting instructions to modify the graphical user interface to display the recommended products to the user.
Opening claim text (preview).
What is claimed is: 1. A system comprising: one or more processors; and one or more non-transitory computer-readable media storing computing instructions that, when executed on the one or more processors, cause the one or more processors to perform: receiving a user request that is input to a web application included in an online component via a graphical user interface, the user request corresponding to a user search query from a user for a product; determining, by a request distributor, included in the online component whether a system bandit service is operating in a first processing mode or a second processing mode, wherein the request distributor maintains computing resource availability and response time agreements, wherein the system bandit service operates as middleware between the online component and an offline component, wherein the system bandit service comprises a first distributed system with a first queue, a second distributed system with a second queue, an explore module, and an exploit module, wherein the request distributor determines which of the first queue or the second queue is a shorter queue with more memory, and wherein when the request distributor determines that the first queue is the shorter queue, the request distributor transmits the user request to the first queue of the first distributed system; transmitting the user request to: (1) the exploit module, via the second distributed system, when the first processing mode is a high processing mode and the second processing mode is a low processing mode, or (2) the explore module, via the first distributed system, when the first processing mode is the low processing mode and the second processing mode is the high processing mode; using a process of a reward model that comprises stage-wise exploration and exploitation with an optimal design, wherein the optimal design leverages a D-optimal design to increase information gain during exploration compared to not using the D-optimal design, by determining a randomized strategy for a plurality of candidate recommendation systems based on a ratio of a number of the plurality of candidate recommendation systems, the randomized strategy to be stored in a collected history data; when the system bandit service is determined to be operating in the low processing mode, analyzing the user request, via the offline component, using the plurality of candidate recommendation systems and generating respective predicted reward values and corresponding pre-computations associated with the reward model, to be stored in the collected history data; when the system bandit service is determined to be operating in the high processing mode, determining, via the system bandit service, the candidate recommendation system from the plurality of candidate recommendation systems as a recommendation system with a maximum reward value for the reward model based on the user request and the collected history data; processing, via the system bandit service, the user request with one or more of the recommendation system or the one or more candidate recommendation systems to identify recommended products to display to the user; and transmitting instructions, from the system bandit service to an algorithm service included in the online component, to modify the graphical user interface to display the recommended products to the user and determining a reward value for the one or more of the recommendation system or the one or more candidate recommendation systems, to be stored in the collected history data. 2. The system of claim 1 , wherein: the first distributed system has the shorter queue than the second distributed system and the user request is transmitted to the first distributed system. 3. The system of claim 2 , wherein: the first processing mode is the low processing mode; the second processing mode is the high processing mode consuming more computing resources than the low processing mode; and the process comprises an explore process. 4. The system of claim 2 , wherein: the first processing mode is the high processing mode; the second processing mode is the low processing mode consuming less computing resources than the high processing mode; and the process comprises an exploit process. 5. The system of claim 1 , wherein: the first processing mode is the low processing mode; the second processing mode is the high processing mode consuming more computing resources than the low processing mode; and the process comprises an explore process. 6. The system of claim 1 , wherein: the first processing mode is the high processing mode; the second processing mode is the low processing mode consuming less computing resources than the high processing mode; and the process comprises an exploit process. 7. The system of claim 6 , wherein the exploit process further comprises resetting the collected history data collected prior to determining the randomized strategy. 8. The system of claim 1 , wherein determining the randomized strategy further comprises: determining (a) a respective maximum value for a number of exploit rounds for each of the one or more candidate recommendation systems and (b) a respective maximum value for a number of explore rounds for each of the one or more candidate recommendation systems; determining a respective potential randomized strategy based on a respective ratio for the number of exploit rounds and the number of explore rounds for each of the one or more candidate recommendation systems; and determining a respective randomized strategy for each of the one or more candidate recommendation systems based on a relationship between the respective potential randomized strategy and a previous respective randomized strategy. 9. The system of claim 1 , further comprising: determining model parameters included in the corresponding pre-computations associated with the reward model based on the collected history data comprising performing empirical-risk minimization. 10. A method implemented via execution of computing instructions configured to run at one or more processors and configured to be stored at non-transitory computer-readable media, the method comprising: receiving a user request that is input to a web application included in an online component via a graphical user interface, the user request corresponding to a user search query from a user for a product; determining, by a request distributor, included in the online component whether a system bandit service is operating in a first processing mode or a second processing mode, wherein the request distributor maintains computing resource availability and response time agreements, wherein the system bandit service operates as middleware between the online component and an offline component, wherein the system bandit service comprises a first distributed system with a first queue, a second distributed system with a second queue, an explore module, and an exploit module, wherein the request distributor determines which of the first queue or the second queue is a shorter queue with more memory, and wherein when the request distributor determines that the first queue is the shorter queue, the request distributor transmits the user request to the first queue of the first distributed system; transmitting the user request to: (1) the exploit module, via the second distributed system, when the first processing mode is a high processing mode and the second processing mode is a low processing mode, or (2) the explore module, via the first distributed system, when the first processing mode is the low processing mode and the second processing mode is the high processing mode; using a process of a reward model that comprises stage-wise e
graphically representing goods, e.g. 3D product representation · CPC title
considering the load · CPC title
Recommending goods or services · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.