Content based video content segmentation
US-9888279-B2 · Feb 6, 2018 · US
US11093755B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11093755-B2 |
| Application number | US-201916688356-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 19, 2019 |
| Priority date | Nov 19, 2019 |
| Publication date | Aug 17, 2021 |
| Grant date | Aug 17, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system, method, and computer program product for segmenting videos. The system includes at least one processing component, at least one memory component, a video, an extraction component, and a graphing component. The extraction component is configured to extract image and text data from the video, identify entities in the image data, assign at least one entity relation to the entities in the image data, identifying entities in the text data, and assign at least one entity relation to the entities in the text data. The graphing component is configured to generate an image knowledge graph for the entity relations assigned to the entities in the image data, generate a text knowledge graph for the entity relations assigned to the at least two entities in the text data, and generate a weighted knowledge graph based on the image and text knowledge graphs.
Opening claim text (preview).
What is claimed is: 1. A system for segmenting videos, comprising: at least one processing component; at least one memory component; a video; an extraction component configured to: extract image data and text data from the video; identify at least two entities in the image data; assign at least one entity relation to the at least two entities in the image data; identify at least two entities in the text data; and assign at least one entity relation to the two or more entities in the text data; and a graphing component configured to: generate an image knowledge graph for the at least one entity relation assigned to the at least two entities in the image data; generate a text knowledge graph for the at least one entity relation assigned to the at least two entities in the text data; and generate a weighted knowledge graph based on the image knowledge graph and the text knowledge graph. 2. The system of claim 1 , wherein the weighted knowledge graph includes relation weights for the at least one entity relation assigned to the at least two entities in the image data and the at least one entity relation assigned to the at least two entities in the text data. 3. The system of claim 2 , further comprising a grouping component configured to: identify a top relation in the at least one entity relation assigned to the at least two entities in the image data and the at least one entity relation assigned to the at least two entities in the text data, wherein the top relation is an entity relation having a relation weight greater than a threshold relation weight; select frames of the video that correspond to the top relation; and group the frames into a video segment. 4. The system of claim 3 , wherein the grouping component is further configured to: determine that there are remaining frames of the video that do not include the top relation; determine that the frames in the video segment are nearest to the remaining frames; and group the remaining frames with the video segment. 5. The system of claim 1 , wherein the video is divided into pictures, wherein each picture includes a set of frames. 6. The system of claim 1 , wherein the text data is captions. 7. The system of claim 1 , wherein the text data is extracted from speech data. 8. The system of claim 1 , wherein the at least two entities in the image data are identified based on facial recognition. 9. A method, comprising: receiving a video; extracting image data and text data from the video; identifying at least two entities in the image data; assigning at least one entity relation to the at least two entities in the image data; identifying at least two entities in the text data; assigning at least one entity relation to the at least two entities in the text data; generating an image knowledge graph for the at least one entity relation assigned to the at least two entities in the image data; generating a text knowledge graph for the at least one entity relation assigned to the at least two entities in the text data; and generating a weighted knowledge graph based on the image knowledge graph and the text knowledge graph. 10. The method of claim 9 , wherein the weighted knowledge graph includes relation weights for the at least one entity relation assigned to the at least two entities in the image data and the at least one entity relation assigned to the at least two entities in the text data. 11. The method of claim 10 , further comprising: identifying a top relation in the at least one entity relation assigned to the at least two entities in the image data and the at least one entity relation assigned to the at least two entities in the text data, wherein the top relation is an entity relation having a relation weight greater than a threshold relation weight; selecting frames of the video that correspond to the top relation; and grouping the frames into a video segment. 12. The method of claim 11 , further comprising: determining that there are remaining frames of the video that do not include the top relation; determining that the frames in the video segment are nearest to the remaining frames; and grouping the remaining frames with the video segment. 13. The method of claim 9 , wherein the video is divided into pictures, wherein each picture includes a set of frames. 14. The method of claim 9 , wherein the text data is captions. 15. The method of claim 9 , wherein the at least two entities in the image data are identified based on facial recognition. 16. A computer program product for segmenting videos, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause a device to perform a method, the method comprising: receiving a video; extracting image data and text data from the video; identifying at least two entities in the image data; assigning at least one entity relation to the at least two entities in the image data; identifying at least two entities in the text data; assigning at least one entity relation to the at least two entities in the text data; generating an image knowledge graph for the at least one entity relation assigned to the at least two entities in the image data; generating a text knowledge graph for the at least one entity relation assigned to the at least two entities in the text data; and generating a weighted knowledge graph based on the image knowledge graph and the text knowledge graph. 17. The computer program product of claim 16 , wherein the weighted knowledge graph includes relation weights for the at least one entity relation assigned to the at least two entities in the image data and the at least one entity relation assigned to the at least two entities in the text data. 18. The computer program product of claim 17 , further comprising: identifying a top relation in the at least one entity relation assigned to the at least two entities in the image data and the at least one entity relation assigned to the at least two entities in the text data, wherein the top relation is an entity relation having a relation weight greater than a threshold relation weight; selecting frames of the video that correspond to the top relation; and grouping the frames into a video segment. 19. The computer program product of claim 18 , further comprising: determining that there are remaining frames of the video that do not include the top relation; determining that the frames in the video segment are nearest to the remaining frames; and grouping the remaining frames with the video segment. 20. The computer program product of claim 16 , wherein the at least two entities in the image data are identified based on facial recognition.
Knowledge engineering; Knowledge acquisition · CPC title
using pattern recognition or machine learning (optical pattern recognition or electronic computations therefor G06V10/88) · CPC title
using neural networks · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Extraction of image or video features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.