Unsupervised classification of documents using a labeled data set of other documents
US-2019377823-A1 · Dec 12, 2019 · US
US11146862B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11146862-B2 |
| Application number | US-201916386031-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 16, 2019 |
| Priority date | Apr 16, 2019 |
| Publication date | Oct 12, 2021 |
| Grant date | Oct 12, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computer system to: extract a set of frames from a video; generate, utilizing a neural network, feature vectors for the set of frames; identify a set of tagged feature vectors corresponding to a set of media content items, the set of tagged feature vectors comprising feature vectors generated from particular media content items and tagged with labels that correspond to the particular media content items; select one or more tagged feature vectors from the set of tagged feature vectors based on distances between the feature vectors and the one or more tagged feature vectors from the set of tagged feature vectors; and generate a set of tags to associate with the video by: selecting tags from the one or more tagged feature vectors; and aggregating the tags selected from the one or more tagged feature vectors. 2. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: generate the feature vectors for the set of frames by: generating, utilizing the neural network, a set of initial feature vectors, wherein the set of initial feature vectors comprise a feature vector for each frame from the set of frames; and generating an aggregated feature vector based on the set of initial feature vectors; and select the one or more tagged feature vectors from the set of tagged feature vectors based on distances between the aggregated feature vector and the one or more tagged feature vectors from the set of tagged feature vectors. 3. The non-transitory computer-readable medium of claim 1 , wherein each feature vector within the set of tagged feature vectors is generated from a media content item and tagged with labels that describe depictions within a particular media content item. 4. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: select the one or more tagged feature vectors from the set of tagged feature vectors by: determining distance values between the feature vectors and the one or more tagged feature vectors from the set of tagged feature vectors; and selecting the one or more tagged feature vectors that correspond to distance values that meet a threshold distance value. 5. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the set of tagged feature vectors from one or more videos associated with actions. 6. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to generate the set of tags to associate with the video by aggregating action based tags corresponding to the one or more tagged feature vectors. 7. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the set of tagged feature vectors by: identifying a media content item comprising text representing one or more verbs; generating, utilizing the neural network, a tagged feature vector for the media content item; assigning tags to the tagged feature vector by assigning the one or more verbs to the tagged feature vector; and associating the tagged feature vector with the set of tagged feature vectors. 8. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to associate the set of tags with a temporal segment of the video comprising the set of frames. 9. The non-transitory computer-readable medium of claim 8 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: provide graphical user interface displaying the video; provide a timeline for the video in the graphical user interface; and place a tag indicator associated with a tag of the set of tags on the timeline at a position corresponding to the temporal segment of the video. 10. The non-transitory computer-readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computer system to identify the media content item comprising text representing one or more verbs by identifying one or more gerunds within text associated with the media content item. 11. A system comprising: memory comprising a neural network and a set of tagged feature vectors corresponding to a set of media content items, the set of tagged feature vectors comprising feature vectors generated from particular media content items and tagged with labels that correspond to the particular media content items; at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: extract a set of frames from a video; generate, utilizing the neural network, feature vectors for the set of frames; and generate a set of tags to associate with the video by: determining distance values between the feature vectors for the set of frames and one or more tagged feature vectors from the set of tagged feature vectors; selecting tags from the one or more tagged feature vectors from the set of tagged feature vectors based on the determined distance values; and aggregating the tags selected from the one or more tagged feature vectors. 12. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to: generate the feature vectors for the set of frames by: generating, utilizing the neural network, a set of initial feature vectors, wherein the set of initial feature vectors comprise a feature vector for each frame from the set of frames; and generating an aggregated feature vector based on the set of initial feature vectors; and generate the set of tags associated with the video by determining the distance values between the aggregated feature vector and the one or more tagged feature vectors from the set of tagged feature vectors. 13. The system of claim 12 , wherein generating the aggregated feature vector comprises combining the set of initial feature vectors utilizing averaging or max pooling. 14. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to select the one or more tags associated with the one or more tagged feature vectors from the set of tagged feature vectors based on the determined distance values by utilizing a k-nearest neighbor algorithm. 15. The system of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the set of tags to associate the set of tags with a temporal segment of the video comprising the set of frames. 16. A computer-implemented method for automatic tagging of videos, the computer-implemented method comprising: extracting a set of frames from a video; generating, utilizing a neural network, feature vectors for the set of frames; performing a step for generating an aggregated feature vector from the feature vectors; determining one or more tagged feature vectors similar to the aggregated feature vector base
Combinations of networks · CPC title
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
using neural networks · CPC title
of sport video content · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.