Systems and methods for providing binge-watching recommendations
US-2024373099-A1 · Nov 7, 2024 · US
US2021281918A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021281918-A1 |
| Application number | US-202117329928-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 25, 2021 |
| Priority date | Apr 23, 2019 |
| Publication date | Sep 9, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A video recommendation method is provided, including: inputting a video to a first feature extraction network, performing feature extraction on at least one consecutive video frame in the video, and outputting a video feature of the video; inputting user data of a user to a second feature extraction network, performing feature extraction on the discrete user data, and outputting a user feature of the user; performing feature fusion based on the video feature and the user feature, and obtaining a recommendation probability of recommending the video to the user; and determining, according to the recommendation probability, whether to recommend the video to the user.
Opening claim text (preview).
What is claimed is: 1 . A video recommendation method, performed by a computer device, the method comprising: inputting a video to a first feature extraction network; performing video feature extraction on at least one consecutive video frame in the video with the first feature extraction network to generate a video feature of the video; inputting user data of a user to a second feature extraction network; performing user feature extraction on the user data with the second feature extraction network to generate a user feature of the user, the user date being discrete; performing first feature fusion based at least on the video feature and the user feature to obtain a first recommendation probability of recommending the video to the user; and determining, according to the first recommendation probability, whether to recommend the video to the user. 2 . The method according to claim 1 , wherein inputting the video to the first feature extraction network comprises: separately inputting the at least one consecutive video frame in the video to a temporal convolutional network and a convolutional neural network in the first feature extraction network, wherein performing the video feature extraction on the at least one consecutive video frame in the video with the first feature extraction network to generate the video feature of the video comprises: extracting the video feature of the video through performing first convolution on the at least one consecutive video frame by using the temporal convolutional network and the convolutional neural network. 3 . The method according to claim 2 , wherein performing the first convolution on the at least one consecutive video frame by using the temporal convolutional network and the convolutional neural network to generate the video feature of the video comprises: performing causal convolution on at least one image frame in the at least one consecutive video frame using the temporal convolutional network to obtain an image feature of the video; performing audio convolution on at least one audio frame in the at least one consecutive video frame using the convolutional neural network to obtain an audio feature of the video; and performing second feature fusion on the image feature of the video and the audio feature of the video to obtain the video feature of the video. 4 . The method according to claim 3 , wherein performing the second feature fusion on the image feature and the audio feature to obtain the video feature comprises: performing bilinear pooling on the image feature and the audio feature to obtain the video feature. 5 . The method according to claim 1 , wherein performing the user feature extraction on the user data with the second feature extraction network comprises: performing general linear combination on the user data by using a wide component in the second feature extraction network to obtain a wide feature of the user; performing embedding and third convolution on the user data by using a deep component in the second feature extraction network to obtain a deep feature of the user; and performing third feature fusion on the wide feature of the user and the deep feature of the user to obtain the user feature of the user. 6 . The method according to claim 5 , wherein performing the third feature fusion on the wide feature of the user and the deep feature of the user to obtain the user feature of the user comprises: cascading the wide feature of the user and the deep feature of the user by using a fully-connected layer to obtain the user feature of the user. 7 . The method according to claim 1 , wherein performing the first feature fusion based at least on the video feature and the user feature to obtain the first recommendation probability of recommending the video to the user comprises: performing dot multiplication on the video feature and the user feature to obtain the first recommendation probability of recommending the video to the user. 8 . The method according to claim 1 , wherein the method further comprises: inputting at least one text corresponding to the video to a third feature extraction network; performing text feature extraction on the at least one text with the third feature extraction network to generate a text feature of the video, the at least one text being discrete. 9 . The method according to claim 8 , wherein performing the text feature extraction on the at least one text with the third feature extraction network, comprises: performing general linear combination on the at least one text by using a wide component in the third feature extraction network to obtain a wide feature of the at least one text; performing embedding and fourth convolution on the at least one text by using a deep component in the third feature extraction network to obtain a deep feature of the at least one text; and performing fourth feature fusion on the wide feature of the at least one text and the deep feature of the at least one text to obtain the text feature of the video. 10 . The method according to claim 9 , wherein the performing the fourth feature fusion on the wide feature of the at least one text and the deep feature of the at least one text to obtain the text feature of the video comprises: cascading the wide feature of the at least one text and the deep feature of the at least one text by using a fully-connected layer to obtain the text feature of the video. 11 . The method according to claim 8 , wherein performing the first feature fusion based at least on the video feature and the user feature to obtain the first recommendation probability of recommending the video to the user comprises: performing video-user feature fusion on the video feature and the user feature to obtain a first associated feature between the video and the user; performing text-user feature fusion on the text feature and the user feature to obtaining a second associated feature between the at least one text and the user; and performing dot multiplication on the first associated feature and the second associated feature to obtain the first recommendation probability of recommending the video to the user. 12 . The method according to claim 11 : wherein performing the video-user feature fusion on the video feature and the user feature to obtain the first associated feature between the video and the user comprises performing video-user bilinear pooling on the video feature and the user feature to obtain the first associated feature between the video and the user; and wherein performing the text-user feature fusion on the text feature and the user feature to obtain the second associated feature between the text and the user comprises performing text-user bilinear pooling on the text feature and the user feature to obtain the second associated feature between the text and the user. 13 . The method according to claim 1 , wherein determining, according to the first recommendation probability, whether to recommend the video to the user comprises: determining, when first the recommendation probability is greater than a probability threshold, to recommend the video to the user; and determining, when the first recommendation probability is less than or equal to the probability threshold, not to recommend the video to the user. 14 . The method according to claim 1 , further comprises: obtaining two or more extra recommendation probabilities respectively for two or more extra videos; obtaining probability ranking of the extra two or more recommendation probabilities and the first recommendation probability; and determining whether to recommend a c
using recommendation lists, e.g. of programmes or channels sorted out according to their score · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
of extracted features · CPC title
using neural networks · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.