Imitating motion capture clips using a neural network
US-2021082170-A1 · Mar 18, 2021 · US
US11314970B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11314970-B1 |
| Application number | US-202016953049-A |
| Country | US |
| Kind code | B1 |
| Filing date | Nov 19, 2020 |
| Priority date | Nov 19, 2020 |
| Publication date | Apr 26, 2022 |
| Grant date | Apr 26, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A video summarization system generates a concatenated feature set by combining a feature set of a candidate video shot and a summarization feature set. Based on the concatenated feature set, the video summarization system calculates multiple action options of a reward function included in a trained reinforcement learning module. The video summarization system determines a reward outcome included in the multiple action options. The video summarization system modifies the summarization feature set to include the feature set of the candidate video shot by applying a particular modification indicated by the reward outcome. The video summarization system identifies video frames associated with the modified summarization feature set, and generates a summary video based on the identified video frames.
Opening claim text (preview).
What is claimed is: 1. A method of generating a summary video of digital video data, the method comprising: generating a concatenated feature set by combining: (i) a feature set of a candidate video shot that is included in a group of multiple video shots, and (ii) additional feature sets that are included in a summarization feature set, wherein the additional feature sets are associated with additional video shots selected from the group of multiple video shots; calculating multiple action options of a reward function that is applied to the concatenated feature set, the reward function being included in a trained reinforcement learning (“RL”) module, the multiple action options describing a group of modification actions, the reward function including decision process vector parameters that relate the multiple action options to the concatenated feature set; determining a reward outcome that is included the multiple action options, wherein the reward outcome indicates, from the group of modification actions, a particular modification of the summarization feature set; modifying, responsive to determining the reward outcome, the summarization feature set to include the feature set of the candidate video shot by applying the particular modification indicated by the reward outcome; identifying one or more video frames associated with the modified summarization feature set; and generating a summary video based on the identified video frames. 2. The method of claim 1 , further comprising: wherein the particular modification indicated by the reward outcome includes at least one of: a first modification responsive to determining that the reward outcome is a first action outcome included in the multiple action options, or a second modification responsive to determining that the reward outcome is a second action outcome included in the multiple action options. 3. The method of claim 2 , wherein: the first modification comprises including, in the summarization feature set, the feature set of the candidate video shot concatenated with the additional feature sets associated with the additional video shots, and the second modification includes removing, from the summarization feature set, a particular feature set of a particular one of the additional video shots. 4. The method of claim 1 , further comprising: generating, for each video frame included in the digital video data, a sequence identification score describing visual features of the video frame; calculating, for each video frame included in the digital video data, a difference between the sequence identification score of the video frame and an additional sequence identification score of a subsequent video frame included in the digital video data; and determining, for each video frame included in the digital video data, that the video frame and the subsequent video frame are included in a particular video shot of the group of multiple video shots, wherein the determination is based on a comparison of the difference to a shot threshold. 5. The method of claim 1 , further comprising: extracting, from the candidate video shot, one or more of visual features or audible features; and modifying the feature set of the candidate video shot to include the one or more of the visual features or the audible features. 6. The method of claim 1 , further comprising: identifying, for the candidate video shot, a classification label; and modifying the feature set of the candidate video shot to include the classification label. 7. A system for generating a summary video of digital video data, the system comprising: a summarization decision module for generating a summarization feature set by applying a reward function to a group of multiple video shots, the reward function included in a trained reinforcement learning (“RL”) module, the reward function including decision process vector parameters; the summarization decision module configured for: receiving a feature set of a candidate video shot that is included in the group of multiple video shots; concatenating the feature set of the candidate video shot with additional feature sets that are included in the summarization feature set, the additional feature sets associated with additional video shots selected from the group of multiple video shots; determining, by applying the reward function to the concatenated feature sets, a reward outcome of the reward function, wherein the decision process vector parameters relate the reward outcome to the concatenated feature set, wherein the reward outcome indicates a particular modification of the summarization feature set; and modifying, responsive to the reward outcome and by applying the particular modification indicated by the reward outcome, the summarization feature set to include the feature set of the candidate video shot; and a video-editing module configured for: identifying one or more video frames associated with the modified summarization feature set; and generating a summary video based on the identified video frames. 8. The system of claim 7 , wherein the trained RL module is configured for: calculating multiple action options of the reward function, the multiple action options describing a group of modification actions available to the trained RL module, wherein the reward outcome is included in the multiple action options, wherein modifying the summarization feature set includes at least one of: a first modification responsive to determining that the reward outcome is a first action option included in the multiple action options, or a second modification responsive to determining that the reward outcome is a second action option included in the multiple action options. 9. The system of claim 8 , wherein: the first modification comprises including, in the summarization feature set, the feature set of the candidate video shot concatenated with the additional feature sets associated with the additional video shots, and the second modification includes removing, from the summarization feature set, a particular feature set of a particular one of the additional video shots. 10. The system of claim 7 , further comprising a video-splitting module for generating the group of multiple video shots, the video-splitting module configured for: generating, for each video frame included in the digital video data, a sequence identification score describing visual features of the video frame; calculating, for each video frame included in the digital video data, a difference between the sequence identification score of the video frame and an additional sequence identification score of a subsequent video frame included in the digital video data; and determining, for each video frame included in the digital video data, that the video frame and the subsequent video frame are included in a particular video shot of the group of multiple video shots, wherein the determination is based on a comparison of the difference to a shot threshold. 11. The system of claim 7 , wherein the generated summary video is provided to one or more of: a video publishing system, a video archive system, or a video search-and-retrieval system. 12. The system of claim 7 , further comprising a feature-extraction neural network configured for: extracting, from the candidate video shot, one or more of visual features or audible features; and modifying the feature set of the candidate video shot to include the one or more of the visual features or the audible features. 13. The system of claim 7 , further comprising a classification neural network configured for: identifying, for the candidate video shot, a cla
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Reinforcement learning · CPC title
Transfer learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.