Generating tags for a digital video

US11146862B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11146862-B2
Application numberUS-201916386031-A
CountryUS
Kind codeB2
Filing dateApr 16, 2019
Priority dateApr 16, 2019
Publication dateOct 12, 2021
Grant dateOct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computer system to: extract a set of frames from a video; generate, utilizing a neural network, feature vectors for the set of frames; identify a set of tagged feature vectors corresponding to a set of media content items, the set of tagged feature vectors comprising feature vectors generated from particular media content items and tagged with labels that correspond to the particular media content items; select one or more tagged feature vectors from the set of tagged feature vectors based on distances between the feature vectors and the one or more tagged feature vectors from the set of tagged feature vectors; and generate a set of tags to associate with the video by: selecting tags from the one or more tagged feature vectors; and aggregating the tags selected from the one or more tagged feature vectors. 2. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: generate the feature vectors for the set of frames by: generating, utilizing the neural network, a set of initial feature vectors, wherein the set of initial feature vectors comprise a feature vector for each frame from the set of frames; and generating an aggregated feature vector based on the set of initial feature vectors; and select the one or more tagged feature vectors from the set of tagged feature vectors based on distances between the aggregated feature vector and the one or more tagged feature vectors from the set of tagged feature vectors. 3. The non-transitory computer-readable medium of claim 1 , wherein each feature vector within the set of tagged feature vectors is generated from a media content item and tagged with labels that describe depictions within a particular media content item. 4. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: select the one or more tagged feature vectors from the set of tagged feature vectors by: determining distance values between the feature vectors and the one or more tagged feature vectors from the set of tagged feature vectors; and selecting the one or more tagged feature vectors that correspond to distance values that meet a threshold distance value. 5. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the set of tagged feature vectors from one or more videos associated with actions. 6. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to generate the set of tags to associate with the video by aggregating action based tags corresponding to the one or more tagged feature vectors. 7. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the set of tagged feature vectors by: identifying a media content item comprising text representing one or more verbs; generating, utilizing the neural network, a tagged feature vector for the media content item; assigning tags to the tagged feature vector by assigning the one or more verbs to the tagged feature vector; and associating the tagged feature vector with the set of tagged feature vectors. 8. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to associate the set of tags with a temporal segment of the video comprising the set of frames. 9. The non-transitory computer-readable medium of claim 8 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: provide graphical user interface displaying the video; provide a timeline for the video in the graphical user interface; and place a tag indicator associated with a tag of the set of tags on the timeline at a position corresponding to the temporal segment of the video. 10. The non-transitory computer-readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the media content item comprising text representing one or more verbs by identifying one or more gerunds within text associated with the media content item. 11. A system comprising: memory comprising a neural network and a set of tagged feature vectors corresponding to a set of media content items, the set of tagged feature vectors comprising feature vectors generated from particular media content items and tagged with labels that correspond to the particular media content items; at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: extract a set of frames from a video; generate, utilizing the neural network, feature vectors for the set of frames; and generate a set of tags to associate with the video by: determining distance values between the feature vectors for the set of frames and one or more tagged feature vectors from the set of tagged feature vectors; selecting tags from the one or more tagged feature vectors from the set of tagged feature vectors based on the determined distance values; and aggregating the tags selected from the one or more tagged feature vectors. 12. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to: generate the feature vectors for the set of frames by: generating, utilizing the neural network, a set of initial feature vectors, wherein the set of initial feature vectors comprise a feature vector for each frame from the set of frames; and generating an aggregated feature vector based on the set of initial feature vectors; and generate the set of tags associated with the video by determining the distance values between the aggregated feature vector and the one or more tagged feature vectors from the set of tagged feature vectors. 13. The system of claim 12 , wherein generating the aggregated feature vector comprises combining the set of initial feature vectors utilizing averaging or max pooling. 14. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to select the one or more tags associated with the one or more tagged feature vectors from the set of tagged feature vectors based on the determined distance values by utilizing a k-nearest neighbor algorithm. 15. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the set of tags to associate the set of tags with a temporal segment of the video comprising the set of frames. 16. A computer-implemented method for automatic tagging of videos, the computer-implemented method comprising: extracting a set of frames from a video; generating, utilizing a neural network, feature vectors for the set of frames; performing a step for generating an aggregated feature vector from the feature vectors; determining one or more tagged feature vectors similar to the aggregated feature vector base

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • using neural networks · CPC title

  • of sport video content · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11146862B2 cover?
Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of fra…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification H04N21/8133. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).