Who is the assignee on this patent?

Baidu Usa Llc, Baidu Com Times Tech Beijing Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06N3/042. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jan 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Video recommendation with multi-gate mixture of experts soft actor critic

US2022019878A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022019878-A1
Application number	US-202017040039-A
Country	US
Kind code	A1
Filing date	Jul 15, 2020
Priority date	Jul 15, 2020
Publication date	Jan 20, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are embodiments of a reinforcement learning based large-scale multi-objective ranking system. Embodiments of the system may be used for optimizing short-video recommendation on a video sharing platform. Multiple competing ranking objective and implicit selection bias in user feedback are the main challenges in real-world platform. In order to address those challenges, multi-gate mixture of experts (MMoE) and soft actor critic (SAC) are integrated together into a MMoE_SAC system. Experiment results demonstrate that embodiments of the MMoE_SAC system may greatly reduce a loss function compared to systems only based on single strategies.

First claim

Opening claim text (preview).

1 . A computer-implemented method for multi-objective ranking comprising: receiving, at a multi-gate mixture of experts (MMoE) layer comprising multiple experts and a gating network, hidden embeddings corresponding to one or more states and one or more actions; generating, by each of multiple experts using soft actor critic (SAC), a prediction based on the hidden embeddings, each prediction comprises one or more prediction parameters corresponding to one or more actions respectively; obtaining a weighted sum for predictions by the multiple experts, in accordance of a weight generated by the gating network for each expert; and generating an MMoE layout output, from the MMoE layer, based on the weighted sum. 2 . The computer-implemented method of claim 1 wherein the hidden embeddings are generated by steps of: dividing a plurality of features for the one or more states and the one or more actions into categorical features and numerical features; and defining a universal dynamic feature embedding dictionary to map or project the plurality of features into a unified embedding space for the hidden embedding. 3 . The computer-implemented method of claim 2 wherein defining a universal dynamic feature embedding dictionary to map or project the plurality of features into a unified embedding space comprising: using a one-hot or multi-hot vector for each embedding lookup for categorical features; and transforming, using a transformation weight matrix, the categorical features from sparse features to dense features. 4 . The computer-implemented method of claim 1 wherein each expert is a trained deep neural network (DNN) using embeddings corresponding to one or more states as input for each expert and embeddings corresponding to one or more actions as labels for training. 5 . The computer-implemented method of claim 4 wherein loss calculation for each of the one of more actions is independent from each other during a training process. 6 . The computer-implemented method of claim 4 wherein the training process comprises steps of: regarding each action as a task; adding an entropy-regularized term to a policy function for each action; deriving a soft policy iteration to repeat soft policy evaluation and soft policy improvement alternately; learning soft-function parameters by minimizing a soft Bellman residual; and learning policy parameters by minimizing a KL divergence between the policy function and a quotient obtained by dividing an exponential of the soft-Q function and a partition function. 7 . The computer-implemented method of claim 6 wherein during the soft policy iteration, a Q-function with a minimum Q-value among multiple Q-functions is taken for each policy improvement step. 8 . A system for multi-objective ranking comprising: one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: converting features from one or more data sources into hidden embeddings; receiving, at a multi-gate mixture of experts (MMoE) layer comprising multiple experts and a gating network, the hidden embeddings; generating, by each of multiple experts using soft actor critic (SAC), a prediction based on the input, each prediction comprises one or more prediction parameters corresponding to one or more actions respectively; obtaining a weighted sum for predictions by the multiple experts, in accordance of a weight generated by the gating network for each expert; and generating an MMoE layout output, from the MMoE layer, based on the weighted sum. 9 . The system of claim 8 wherein converting features from one or more data sources into hidden embeddings comprises the steps of: dividing the features into categorical features and numerical features; and defining a universal dynamic feature embedding dictionary to map or project the features into a unified embedding space for the hidden embedding. 10 . The system of claim 9 wherein defining a universal dynamic feature embedding dictionary to map or project input features into a unified embedding space comprises the steps of: using a one-hot or multi-hot vector for each embedding lookup for categorical features; and transforming, using a transformation weight matrix, the categorical features from sparse features to dense features. 11 . The system of claim 9 wherein each expert is a trained neural network using feature embeddings from one or more states as input and feature embeddings from one or more action inputs as labels for training during a training process. 12 . The system of claim 11 wherein the non-transitory computer-readable medium or media further comprises one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps of training the each expert to be performed comprising: regarding each action as a task; adding an entropy-regularized term to a policy function for each action; deriving a soft policy iteration to repeat soft policy evaluation and soft policy improvement alternately; learning soft-function parameters by minimizing a soft Bellman residual; and learning policy parameters by minimizing a KL divergence between the policy function and a quotient obtained by dividing an exponential of the soft-Q function and a partition function. 13 . The system of claim 12 wherein during the soft policy iteration, a Q-function with a minimum Q-value among multiple Q-functions is taken for each policy improvement step. 14 . A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by at least one processor, causes steps for multi-objective ranking comprising: converting features from one or more data sources into hidden embeddings; receiving, at a multi-gate mixture of experts (MMoE) layer comprising multiple experts and a gating network, the hidden embeddings; generating, by each of multiple experts using soft actor critic (SAC), a prediction based on the input, each prediction comprises one or more prediction parameters corresponding to one or more actions respectively; obtaining a weighted sum for predictions by the multiple experts, in accordance of a weight generated by the gating network for each expert; and generating an MMoE layout output, from the MMoE layer, based on the weighted sum. 15 . The non-transitory computer-readable medium or media of claim 14 wherein converting features from one or more data sources into hidden embeddings comprising steps of: dividing a plurality of features for the one or more states and the one or more actions into categorical features and numerical features; and defining a universal dynamic feature embedding dictionary to map or project the plurality of features into a unified embedding space for the hidden embedding. 16 . The non-transitory computer-readable medium or media of claim 15 wherein defining a universal dynamic feature embedding dictionary to map or project input features into a unified embedding space comprises the steps of: using a one-hot or multi-hot vector for each embedding lookup for categorical features; and transforming, using a transformation weight matrix, the categorical features from sparse features to dense features. 17 . The non-transitory computer-readable medium or media of claim 14 wherein each expert is a trained neural network using feature embeddings from one or more states as input and feature embeddings from one o

Assignees

Inventors

Classifications

G06N3/042Primary
Knowledge-based neural networks; Logical representations of neural networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

View patent family 79291538

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022019878A1 cover?: Described herein are embodiments of a reinforcement learning based large-scale multi-objective ranking system. Embodiments of the system may be used for optimizing short-video recommendation on a video sharing platform. Multiple competing ranking objective and implicit selection bias in user feedback are the main challenges in real-world platform. In order to address those challenges, multi-gat…
Who is the assignee on this patent?: Baidu Usa Llc, Baidu Com Times Tech Beijing Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/042. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jan 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Online hyperparameter tuning in distributed machine learning

Addressing a loss-metric mismatch with adaptive loss alignment

Generation of video recommendations using connection networks

Recommender system

Frequently asked questions