Compact video generation device and method, and recording medium in which computer program is recorded
US-10701463-B2 · Jun 30, 2020 · US
US11620335B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620335-B2 |
| Application number | US-202016995835-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 18, 2020 |
| Priority date | Sep 17, 2019 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments relate to a method for generating a video synopsis including receiving a user query; performing an object based analysis of a source video; and generating a synopsis video in response to a video synopsis generation request from a user, and a system therefor. The video synopsis generated by the embodiments reflects the user's desired interaction.
Opening claim text (preview).
What is claimed is: 1. A system for generating a video synopsis, comprising: at least one processor; and a non-transitory memory storing instructions which, when executed by the at least one processor, cause the at least one processor to: detect at least one source object in a source video including at least one object; detect one or more motion of the source object in the source video; generate one or more source object tube including the source object on which the motion is detected; determine an interaction associated with the source object of the tube through an interaction determination model pre-learned to determine interactions associated with an object included in an input image, the interaction determination model comprising a convolution network; and generate one or more video synopsis based on the source object tube associated with the determined interaction. 2. The system for generating a video synopsis according to claim 1 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to detect the source object through an object detection model, and the object detection model is pre-learned to extract one or more feature for detecting an object from an input image and determine a class corresponding to the object included in the input image. 3. The system for generating a video synopsis according to claim 2 , wherein the object detection model is configured to detect the source object by extracting the feature for detecting the object from the input image and determining the class to which each pixel belongs. 4. The system for generating a video synopsis according to claim 3 , wherein the object detection model includes: a first submodel to determine a region of interest (ROI) by detecting a position of the object in the input image; and a second submodel to mask the object included in the ROI. 5. The system for generating a video synopsis according to claim 1 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to: extract a background from the source video. 6. The system for generating a video synopsis according to claim 5 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to: extract the background through a background detection model that determines at least one class regarded as the background for each pixel. 7. The system for generating a video synopsis according to claim 5 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to: extract the background by cutting a region occupied by the source object in the source video. 8. The system for generating a video synopsis according to claim 5 , further comprising: a background database (DB) to store the extracted background of the source video. 9. The system for generating a video synopsis according to claim 1 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to: compute tracking information of the source object by tracking a specific object in a subset of frames in which the specific source object is detected. 10. The system for generating a video synopsis according to claim 9 , wherein tracking information of the source object includes at least one of whether moving or not, a velocity, a speed, and a direction. 11. The system for generating a video synopsis according to claim 1 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to: generate the source object tube based on a subset of frames representing an activity of the source object, or a combination of the subset of frames representing the activity of the source object and a subset of frames representing the source object. 12. The system for generating a video synopsis according to claim 11 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to: filter a region including the source object in the frame of the source video, and generate the source object tube in which at least part of background of the source video is removed. 13. The system for generating a video synopsis according to claim 1 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to: when a image region including the source object is extracted as a result of the detection of the source object, generate the source object tube using the extracted image region instead of the filtering. 14. The system for generating a video synopsis according to claim 1 , wherein the interaction determination model is learned to preset an associable interaction class with the object, wherein the said object is a subject of an action that triggers the interaction. 15. The system for generating a video synopsis according to claim 14 , wherein the interaction determination model is further configured to: receive image having a size including a first object as the input image and extract a first feature, receive image having a size including a region including the first object and a different region as the input image and extract a second feature, and determine interaction associated with the first object based on the first feature and the second feature. 16. The system for generating a video synopsis according to claim 15 , wherein the different region includes a region including a second object that is different from the first object, or a background. 17. The system for generating a video synopsis according to claim 15 , wherein the first feature is a feature extracted to detect the source object. 18. The system for generating a video synopsis according to claim 1 , wherein the interaction determination model is configured to determine a class of the interaction by detecting an activity of a specific object which is a subject of an action triggering the interaction and detecting a different element associated with the interaction. 19. The system for generating a video synopsis according to claim 18 , wherein the interaction determination model includes: an activity detection network to detect the activity of the specific object in the input image; and an object detection network to detect an object that is different from the activity object by extracting a feature from the input image, the activity detection network is configured to extract the feature for determining a class of the activity appearing in the video, and the object detection network is configured to extract the feature for determining a class of the object appearing in the video. 20. The system for generating a video synopsis according to claim 19 , wherein the feature for determining the class of the activity includes a pose feature, and the feature for determining the class of the object includes an appearance feature. 21. The system for generating a video synopsis according to claim 19 , wherein the interaction determination model is further configured to: link a set of values computed by the activity detection network and a set of values computed by the object detection network to generate an interaction matrix, and determine the interaction associated with the specific object in the input image based on the activity and the object corresponding to a row and a column of an element having a highest value among elements of the interaction matrix.
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Detecting features for summarising video content · CPC title
Creating video summaries, e.g. movie trailer {(retrieval in video databases by using presentations in form of a video summary G06F16/739)} · CPC title
Classification techniques · CPC title
using classification, e.g. of video objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.