What technology area does this patent fall under?

Primary CPC classification H04N21/8133. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Generating tags for a digital video

US11146862B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11146862-B2
Application number	US-201916386031-A
Country	US
Kind code	B2
Filing date	Apr 16, 2019
Priority date	Apr 16, 2019
Publication date	Oct 12, 2021
Grant date	Oct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computer system to: extract a set of frames from a video; generate, utilizing a neural network, feature vectors for the set of frames; identify a set of tagged feature vectors corresponding to a set of media content items, the set of tagged feature vectors comprising feature vectors generated from particular media content items and tagged with labels that correspond to the particular media content items; select one or more tagged feature vectors from the set of tagged feature vectors based on distances between the feature vectors and the one or more tagged feature vectors from the set of tagged feature vectors; and generate a set of tags to associate with the video by: selecting tags from the one or more tagged feature vectors; and aggregating the tags selected from the one or more tagged feature vectors. 2. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: generate the feature vectors for the set of frames by: generating, utilizing the neural network, a set of initial feature vectors, wherein the set of initial feature vectors comprise a feature vector for each frame from the set of frames; and generating an aggregated feature vector based on the set of initial feature vectors; and select the one or more tagged feature vectors from the set of tagged feature vectors based on distances between the aggregated feature vector and the one or more tagged feature vectors from the set of tagged feature vectors. 3. The non-transitory computer-readable medium of claim 1 , wherein each feature vector within the set of tagged feature vectors is generated from a media content item and tagged with labels that describe depictions within a particular media content item. 4. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: select the one or more tagged feature vectors from the set of tagged feature vectors by: determining distance values between the feature vectors and the one or more tagged feature vectors from the set of tagged feature vectors; and selecting the one or more tagged feature vectors that correspond to distance values that meet a threshold distance value. 5. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the set of tagged feature vectors from one or more videos associated with actions. 6. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to generate the set of tags to associate with the video by aggregating action based tags corresponding to the one or more tagged feature vectors. 7. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the set of tagged feature vectors by: identifying a media content item comprising text representing one or more verbs; generating, utilizing the neural network, a tagged feature vector for the media content item; assigning tags to the tagged feature vector by assigning the one or more verbs to the tagged feature vector; and associating the tagged feature vector with the set of tagged feature vectors. 8. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to associate the set of tags with a temporal segment of the video comprising the set of frames. 9. The non-transitory computer-readable medium of claim 8 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: provide graphical user interface displaying the video; provide a timeline for the video in the graphical user interface; and place a tag indicator associated with a tag of the set of tags on the timeline at a position corresponding to the temporal segment of the video. 10. The non-transitory computer-readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the media content item comprising text representing one or more verbs by identifying one or more gerunds within text associated with the media content item. 11. A system comprising: memory comprising a neural network and a set of tagged feature vectors corresponding to a set of media content items, the set of tagged feature vectors comprising feature vectors generated from particular media content items and tagged with labels that correspond to the particular media content items; at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: extract a set of frames from a video; generate, utilizing the neural network, feature vectors for the set of frames; and generate a set of tags to associate with the video by: determining distance values between the feature vectors for the set of frames and one or more tagged feature vectors from the set of tagged feature vectors; selecting tags from the one or more tagged feature vectors from the set of tagged feature vectors based on the determined distance values; and aggregating the tags selected from the one or more tagged feature vectors. 12. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to: generate the feature vectors for the set of frames by: generating, utilizing the neural network, a set of initial feature vectors, wherein the set of initial feature vectors comprise a feature vector for each frame from the set of frames; and generating an aggregated feature vector based on the set of initial feature vectors; and generate the set of tags associated with the video by determining the distance values between the aggregated feature vector and the one or more tagged feature vectors from the set of tagged feature vectors. 13. The system of claim 12 , wherein generating the aggregated feature vector comprises combining the set of initial feature vectors utilizing averaging or max pooling. 14. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to select the one or more tags associated with the one or more tagged feature vectors from the set of tagged feature vectors based on the determined distance values by utilizing a k-nearest neighbor algorithm. 15. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the set of tags to associate the set of tags with a temporal segment of the video comprising the set of frames. 16. A computer-implemented method for automatic tagging of videos, the computer-implemented method comprising: extracting a set of frames from a video; generating, utilizing a neural network, feature vectors for the set of frames; performing a step for generating an aggregated feature vector from the feature vectors; determining one or more tagged feature vectors similar to the aggregated feature vector base

Assignees

Adobe Inc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06V10/82
using neural networks · CPC title
G06V20/42
of sport video content · CPC title

Patent family

Related publications grouped by family.

View patent family 72832169

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11146862B2 cover?: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of fra…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification H04N21/8133. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Unsupervised classification of documents using a labeled data set of other documents

Systems and methods for object identification

Search method and processing device

Large-scale image tagging using image-to-topic embedding

Automatically detecting an event and determining whether the event is a particular type of event

Accurate tag relevance prediction for image search

Video generating system and method thereof

Frequently asked questions