Online hyperparameter tuning in distributed machine learning
US-2022027359-A1 · Jan 27, 2022 · US
US2022019878A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022019878-A1 |
| Application number | US-202017040039-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 15, 2020 |
| Priority date | Jul 15, 2020 |
| Publication date | Jan 20, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein are embodiments of a reinforcement learning based large-scale multi-objective ranking system. Embodiments of the system may be used for optimizing short-video recommendation on a video sharing platform. Multiple competing ranking objective and implicit selection bias in user feedback are the main challenges in real-world platform. In order to address those challenges, multi-gate mixture of experts (MMoE) and soft actor critic (SAC) are integrated together into a MMoE_SAC system. Experiment results demonstrate that embodiments of the MMoE_SAC system may greatly reduce a loss function compared to systems only based on single strategies.
Opening claim text (preview).
1 . A computer-implemented method for multi-objective ranking comprising: receiving, at a multi-gate mixture of experts (MMoE) layer comprising multiple experts and a gating network, hidden embeddings corresponding to one or more states and one or more actions; generating, by each of multiple experts using soft actor critic (SAC), a prediction based on the hidden embeddings, each prediction comprises one or more prediction parameters corresponding to one or more actions respectively; obtaining a weighted sum for predictions by the multiple experts, in accordance of a weight generated by the gating network for each expert; and generating an MMoE layout output, from the MMoE layer, based on the weighted sum. 2 . The computer-implemented method of claim 1 wherein the hidden embeddings are generated by steps of: dividing a plurality of features for the one or more states and the one or more actions into categorical features and numerical features; and defining a universal dynamic feature embedding dictionary to map or project the plurality of features into a unified embedding space for the hidden embedding. 3 . The computer-implemented method of claim 2 wherein defining a universal dynamic feature embedding dictionary to map or project the plurality of features into a unified embedding space comprising: using a one-hot or multi-hot vector for each embedding lookup for categorical features; and transforming, using a transformation weight matrix, the categorical features from sparse features to dense features. 4 . The computer-implemented method of claim 1 wherein each expert is a trained deep neural network (DNN) using embeddings corresponding to one or more states as input for each expert and embeddings corresponding to one or more actions as labels for training. 5 . The computer-implemented method of claim 4 wherein loss calculation for each of the one of more actions is independent from each other during a training process. 6 . The computer-implemented method of claim 4 wherein the training process comprises steps of: regarding each action as a task; adding an entropy-regularized term to a policy function for each action; deriving a soft policy iteration to repeat soft policy evaluation and soft policy improvement alternately; learning soft-function parameters by minimizing a soft Bellman residual; and learning policy parameters by minimizing a KL divergence between the policy function and a quotient obtained by dividing an exponential of the soft-Q function and a partition function. 7 . The computer-implemented method of claim 6 wherein during the soft policy iteration, a Q-function with a minimum Q-value among multiple Q-functions is taken for each policy improvement step. 8 . A system for multi-objective ranking comprising: one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: converting features from one or more data sources into hidden embeddings; receiving, at a multi-gate mixture of experts (MMoE) layer comprising multiple experts and a gating network, the hidden embeddings; generating, by each of multiple experts using soft actor critic (SAC), a prediction based on the input, each prediction comprises one or more prediction parameters corresponding to one or more actions respectively; obtaining a weighted sum for predictions by the multiple experts, in accordance of a weight generated by the gating network for each expert; and generating an MMoE layout output, from the MMoE layer, based on the weighted sum. 9 . The system of claim 8 wherein converting features from one or more data sources into hidden embeddings comprises the steps of: dividing the features into categorical features and numerical features; and defining a universal dynamic feature embedding dictionary to map or project the features into a unified embedding space for the hidden embedding. 10 . The system of claim 9 wherein defining a universal dynamic feature embedding dictionary to map or project input features into a unified embedding space comprises the steps of: using a one-hot or multi-hot vector for each embedding lookup for categorical features; and transforming, using a transformation weight matrix, the categorical features from sparse features to dense features. 11 . The system of claim 9 wherein each expert is a trained neural network using feature embeddings from one or more states as input and feature embeddings from one or more action inputs as labels for training during a training process. 12 . The system of claim 11 wherein the non-transitory computer-readable medium or media further comprises one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps of training the each expert to be performed comprising: regarding each action as a task; adding an entropy-regularized term to a policy function for each action; deriving a soft policy iteration to repeat soft policy evaluation and soft policy improvement alternately; learning soft-function parameters by minimizing a soft Bellman residual; and learning policy parameters by minimizing a KL divergence between the policy function and a quotient obtained by dividing an exponential of the soft-Q function and a partition function. 13 . The system of claim 12 wherein during the soft policy iteration, a Q-function with a minimum Q-value among multiple Q-functions is taken for each policy improvement step. 14 . A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by at least one processor, causes steps for multi-objective ranking comprising: converting features from one or more data sources into hidden embeddings; receiving, at a multi-gate mixture of experts (MMoE) layer comprising multiple experts and a gating network, the hidden embeddings; generating, by each of multiple experts using soft actor critic (SAC), a prediction based on the input, each prediction comprises one or more prediction parameters corresponding to one or more actions respectively; obtaining a weighted sum for predictions by the multiple experts, in accordance of a weight generated by the gating network for each expert; and generating an MMoE layout output, from the MMoE layer, based on the weighted sum. 15 . The non-transitory computer-readable medium or media of claim 14 wherein converting features from one or more data sources into hidden embeddings comprising steps of: dividing a plurality of features for the one or more states and the one or more actions into categorical features and numerical features; and defining a universal dynamic feature embedding dictionary to map or project the plurality of features into a unified embedding space for the hidden embedding. 16 . The non-transitory computer-readable medium or media of claim 15 wherein defining a universal dynamic feature embedding dictionary to map or project input features into a unified embedding space comprises the steps of: using a one-hot or multi-hot vector for each embedding lookup for categorical features; and transforming, using a transformation weight matrix, the categorical features from sparse features to dense features. 17 . The non-transitory computer-readable medium or media of claim 14 wherein each expert is a trained neural network using feature embeddings from one or more states as input and feature embeddings from one o
Knowledge-based neural networks; Logical representations of neural networks · CPC title
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Learning methods · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.