Reinforcement learning techniques for automated video summarization

US11314970B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11314970-B1
Application numberUS-202016953049-A
CountryUS
Kind codeB1
Filing dateNov 19, 2020
Priority dateNov 19, 2020
Publication dateApr 26, 2022
Grant dateApr 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A video summarization system generates a concatenated feature set by combining a feature set of a candidate video shot and a summarization feature set. Based on the concatenated feature set, the video summarization system calculates multiple action options of a reward function included in a trained reinforcement learning module. The video summarization system determines a reward outcome included in the multiple action options. The video summarization system modifies the summarization feature set to include the feature set of the candidate video shot by applying a particular modification indicated by the reward outcome. The video summarization system identifies video frames associated with the modified summarization feature set, and generates a summary video based on the identified video frames.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating a summary video of digital video data, the method comprising: generating a concatenated feature set by combining: (i) a feature set of a candidate video shot that is included in a group of multiple video shots, and (ii) additional feature sets that are included in a summarization feature set, wherein the additional feature sets are associated with additional video shots selected from the group of multiple video shots; calculating multiple action options of a reward function that is applied to the concatenated feature set, the reward function being included in a trained reinforcement learning (“RL”) module, the multiple action options describing a group of modification actions, the reward function including decision process vector parameters that relate the multiple action options to the concatenated feature set; determining a reward outcome that is included the multiple action options, wherein the reward outcome indicates, from the group of modification actions, a particular modification of the summarization feature set; modifying, responsive to determining the reward outcome, the summarization feature set to include the feature set of the candidate video shot by applying the particular modification indicated by the reward outcome; identifying one or more video frames associated with the modified summarization feature set; and generating a summary video based on the identified video frames. 2. The method of claim 1 , further comprising: wherein the particular modification indicated by the reward outcome includes at least one of: a first modification responsive to determining that the reward outcome is a first action outcome included in the multiple action options, or a second modification responsive to determining that the reward outcome is a second action outcome included in the multiple action options. 3. The method of claim 2 , wherein: the first modification comprises including, in the summarization feature set, the feature set of the candidate video shot concatenated with the additional feature sets associated with the additional video shots, and the second modification includes removing, from the summarization feature set, a particular feature set of a particular one of the additional video shots. 4. The method of claim 1 , further comprising: generating, for each video frame included in the digital video data, a sequence identification score describing visual features of the video frame; calculating, for each video frame included in the digital video data, a difference between the sequence identification score of the video frame and an additional sequence identification score of a subsequent video frame included in the digital video data; and determining, for each video frame included in the digital video data, that the video frame and the subsequent video frame are included in a particular video shot of the group of multiple video shots, wherein the determination is based on a comparison of the difference to a shot threshold. 5. The method of claim 1 , further comprising: extracting, from the candidate video shot, one or more of visual features or audible features; and modifying the feature set of the candidate video shot to include the one or more of the visual features or the audible features. 6. The method of claim 1 , further comprising: identifying, for the candidate video shot, a classification label; and modifying the feature set of the candidate video shot to include the classification label. 7. A system for generating a summary video of digital video data, the system comprising: a summarization decision module for generating a summarization feature set by applying a reward function to a group of multiple video shots, the reward function included in a trained reinforcement learning (“RL”) module, the reward function including decision process vector parameters; the summarization decision module configured for: receiving a feature set of a candidate video shot that is included in the group of multiple video shots; concatenating the feature set of the candidate video shot with additional feature sets that are included in the summarization feature set, the additional feature sets associated with additional video shots selected from the group of multiple video shots; determining, by applying the reward function to the concatenated feature sets, a reward outcome of the reward function, wherein the decision process vector parameters relate the reward outcome to the concatenated feature set, wherein the reward outcome indicates a particular modification of the summarization feature set; and modifying, responsive to the reward outcome and by applying the particular modification indicated by the reward outcome, the summarization feature set to include the feature set of the candidate video shot; and a video-editing module configured for: identifying one or more video frames associated with the modified summarization feature set; and generating a summary video based on the identified video frames. 8. The system of claim 7 , wherein the trained RL module is configured for: calculating multiple action options of the reward function, the multiple action options describing a group of modification actions available to the trained RL module, wherein the reward outcome is included in the multiple action options, wherein modifying the summarization feature set includes at least one of: a first modification responsive to determining that the reward outcome is a first action option included in the multiple action options, or a second modification responsive to determining that the reward outcome is a second action option included in the multiple action options. 9. The system of claim 8 , wherein: the first modification comprises including, in the summarization feature set, the feature set of the candidate video shot concatenated with the additional feature sets associated with the additional video shots, and the second modification includes removing, from the summarization feature set, a particular feature set of a particular one of the additional video shots. 10. The system of claim 7 , further comprising a video-splitting module for generating the group of multiple video shots, the video-splitting module configured for: generating, for each video frame included in the digital video data, a sequence identification score describing visual features of the video frame; calculating, for each video frame included in the digital video data, a difference between the sequence identification score of the video frame and an additional sequence identification score of a subsequent video frame included in the digital video data; and determining, for each video frame included in the digital video data, that the video frame and the subsequent video frame are included in a particular video shot of the group of multiple video shots, wherein the determination is based on a comparison of the difference to a shot threshold. 11. The system of claim 7 , wherein the generated summary video is provided to one or more of: a video publishing system, a video archive system, or a video search-and-retrieval system. 12. The system of claim 7 , further comprising a feature-extraction neural network configured for: extracting, from the candidate video shot, one or more of visual features or audible features; and modifying the feature set of the candidate video shot to include the one or more of the visual features or the audible features. 13. The system of claim 7 , further comprising a classification neural network configured for: identifying, for the candidate video shot, a cla

Assignees

Inventors

Classifications

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Reinforcement learning · CPC title

  • Transfer learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11314970B1 cover?
A video summarization system generates a concatenated feature set by combining a feature set of a candidate video shot and a summarization feature set. Based on the concatenated feature set, the video summarization system calculates multiple action options of a reward function included in a trained reinforcement learning module. The video summarization system determines a reward outcome include…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/47. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).