Video recommendation method and device, computer device and storage medium

US2021281918A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021281918-A1
Application numberUS-202117329928-A
CountryUS
Kind codeA1
Filing dateMay 25, 2021
Priority dateApr 23, 2019
Publication dateSep 9, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A video recommendation method is provided, including: inputting a video to a first feature extraction network, performing feature extraction on at least one consecutive video frame in the video, and outputting a video feature of the video; inputting user data of a user to a second feature extraction network, performing feature extraction on the discrete user data, and outputting a user feature of the user; performing feature fusion based on the video feature and the user feature, and obtaining a recommendation probability of recommending the video to the user; and determining, according to the recommendation probability, whether to recommend the video to the user.

First claim

Opening claim text (preview).

What is claimed is: 1 . A video recommendation method, performed by a computer device, the method comprising: inputting a video to a first feature extraction network; performing video feature extraction on at least one consecutive video frame in the video with the first feature extraction network to generate a video feature of the video; inputting user data of a user to a second feature extraction network; performing user feature extraction on the user data with the second feature extraction network to generate a user feature of the user, the user date being discrete; performing first feature fusion based at least on the video feature and the user feature to obtain a first recommendation probability of recommending the video to the user; and determining, according to the first recommendation probability, whether to recommend the video to the user. 2 . The method according to claim 1 , wherein inputting the video to the first feature extraction network comprises: separately inputting the at least one consecutive video frame in the video to a temporal convolutional network and a convolutional neural network in the first feature extraction network, wherein performing the video feature extraction on the at least one consecutive video frame in the video with the first feature extraction network to generate the video feature of the video comprises: extracting the video feature of the video through performing first convolution on the at least one consecutive video frame by using the temporal convolutional network and the convolutional neural network. 3 . The method according to claim 2 , wherein performing the first convolution on the at least one consecutive video frame by using the temporal convolutional network and the convolutional neural network to generate the video feature of the video comprises: performing causal convolution on at least one image frame in the at least one consecutive video frame using the temporal convolutional network to obtain an image feature of the video; performing audio convolution on at least one audio frame in the at least one consecutive video frame using the convolutional neural network to obtain an audio feature of the video; and performing second feature fusion on the image feature of the video and the audio feature of the video to obtain the video feature of the video. 4 . The method according to claim 3 , wherein performing the second feature fusion on the image feature and the audio feature to obtain the video feature comprises: performing bilinear pooling on the image feature and the audio feature to obtain the video feature. 5 . The method according to claim 1 , wherein performing the user feature extraction on the user data with the second feature extraction network comprises: performing general linear combination on the user data by using a wide component in the second feature extraction network to obtain a wide feature of the user; performing embedding and third convolution on the user data by using a deep component in the second feature extraction network to obtain a deep feature of the user; and performing third feature fusion on the wide feature of the user and the deep feature of the user to obtain the user feature of the user. 6 . The method according to claim 5 , wherein performing the third feature fusion on the wide feature of the user and the deep feature of the user to obtain the user feature of the user comprises: cascading the wide feature of the user and the deep feature of the user by using a fully-connected layer to obtain the user feature of the user. 7 . The method according to claim 1 , wherein performing the first feature fusion based at least on the video feature and the user feature to obtain the first recommendation probability of recommending the video to the user comprises: performing dot multiplication on the video feature and the user feature to obtain the first recommendation probability of recommending the video to the user. 8 . The method according to claim 1 , wherein the method further comprises: inputting at least one text corresponding to the video to a third feature extraction network; performing text feature extraction on the at least one text with the third feature extraction network to generate a text feature of the video, the at least one text being discrete. 9 . The method according to claim 8 , wherein performing the text feature extraction on the at least one text with the third feature extraction network, comprises: performing general linear combination on the at least one text by using a wide component in the third feature extraction network to obtain a wide feature of the at least one text; performing embedding and fourth convolution on the at least one text by using a deep component in the third feature extraction network to obtain a deep feature of the at least one text; and performing fourth feature fusion on the wide feature of the at least one text and the deep feature of the at least one text to obtain the text feature of the video. 10 . The method according to claim 9 , wherein the performing the fourth feature fusion on the wide feature of the at least one text and the deep feature of the at least one text to obtain the text feature of the video comprises: cascading the wide feature of the at least one text and the deep feature of the at least one text by using a fully-connected layer to obtain the text feature of the video. 11 . The method according to claim 8 , wherein performing the first feature fusion based at least on the video feature and the user feature to obtain the first recommendation probability of recommending the video to the user comprises: performing video-user feature fusion on the video feature and the user feature to obtain a first associated feature between the video and the user; performing text-user feature fusion on the text feature and the user feature to obtaining a second associated feature between the at least one text and the user; and performing dot multiplication on the first associated feature and the second associated feature to obtain the first recommendation probability of recommending the video to the user. 12 . The method according to claim 11 : wherein performing the video-user feature fusion on the video feature and the user feature to obtain the first associated feature between the video and the user comprises performing video-user bilinear pooling on the video feature and the user feature to obtain the first associated feature between the video and the user; and wherein performing the text-user feature fusion on the text feature and the user feature to obtain the second associated feature between the text and the user comprises performing text-user bilinear pooling on the text feature and the user feature to obtain the second associated feature between the text and the user. 13 . The method according to claim 1 , wherein determining, according to the first recommendation probability, whether to recommend the video to the user comprises: determining, when first the recommendation probability is greater than a probability threshold, to recommend the video to the user; and determining, when the first recommendation probability is less than or equal to the probability threshold, not to recommend the video to the user. 14 . The method according to claim 1 , further comprises: obtaining two or more extra recommendation probabilities respectively for two or more extra videos; obtaining probability ranking of the extra two or more recommendation probabilities and the first recommendation probability; and determining whether to recommend a c

Assignees

Inventors

Classifications

  • using recommendation lists, e.g. of programmes or channels sorted out according to their score · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • of extracted features · CPC title

  • using neural networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021281918A1 cover?
A video recommendation method is provided, including: inputting a video to a first feature extraction network, performing feature extraction on at least one consecutive video frame in the video, and outputting a video feature of the video; inputting user data of a user to a second feature extraction network, performing feature extraction on the discrete user data, and outputting a user feature …
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification H04N21/4826. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Sep 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).