Graph convolutional networks for video grounding

US11442986B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11442986-B2
Application numberUS-202016792208-A
CountryUS
Kind codeB2
Filing dateFeb 15, 2020
Priority dateFeb 15, 2020
Publication dateSep 13, 2022
Grant dateSep 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Method and apparatus that includes receiving a query describing an aspect in a video, the video including a plurality of frames, identifying multiple proposals that potentially correspond to the query where each of the proposals includes a subset of the plurality of frames, ranking the proposals using a graph convolution network that identifies relationships between the proposals, and selecting, based on the ranking, one of the proposals as a video segment that correlates to the query.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a query describing an aspect in a video, the video comprising a plurality of frames; identifying multiple proposals that potentially correspond to the query, wherein each of the proposals comprises a subset of the plurality of frames; generating a graph based on the query and the multiple proposals; ranking the proposals using a graph convolution network (GCN) that identifies relationships between the proposals, wherein the graph is input into the graph convolution network; and selecting, based on the ranking, one of the proposals as a video segment that correlates to the query. 2. The method of claim 1 , wherein generating the graph comprises: identifying visual features in the proposals using a visual feature encoder; and generating query features from the query using a recurrent neural network (RNN). 3. The method of claim 2 , wherein the graph comprises nodes and edges based on the visual features and the query features. 4. The method of claim 3 , wherein ranking the proposals comprises: updating node features for the nodes in the graph; and calculating edge weights for the edges in the graph. 5. The method of claim 3 , wherein ranking the proposals further comprises: performing node aggregation; and ranking the proposals based on the node aggregation and results from processing the graph using the GCN. 6. The method of claim 1 , wherein at least two proposals of the multiple proposals comprise overlapping frames of the plurality of frames in the video. 7. The method of claim 1 , wherein at least two proposals of the multiple proposals comprise subsets of the plurality of frames in the video that do not overlap. 8. A system, comprising: a processor; and memory comprising a program, which when executed by the processor performs an operation, the operation comprising: receiving a query describing an aspect in a video, the video comprising a plurality of frames; identifying multiple proposals that potentially correspond to the query, wherein each of the proposals comprises a subset of the plurality of frames; generating a graph based on the query and the multiple proposals; ranking the proposals using a GCN that identifies relationships between the proposals, wherein the graph is input into the graph convolution network; and selecting, based on the ranking, one of the proposals as a video segment that correlates to the query. 9. The system of claim 8 , wherein generating the graph comprises: identifying visual features in the proposals using a visual feature encoder; and generating query features from the query using a recurrent neural network (RNN). 10. The system of claim 9 , wherein the graph comprises nodes and edges based on the visual features and the query features. 11. The system of claim 10 , wherein ranking the proposals comprises: updating node features for the nodes in the graph; and calculating edge weights for the edges in the graph. 12. The system of claim 10 , wherein ranking the proposals further comprises: performing node aggregation; and ranking the proposals based on the node aggregation and results from processing the graph using the GCN. 13. The system of claim 8 , wherein at least two proposals of the multiple proposals comprise overlapping frames of the plurality of frames in the video. 14. The system of claim 8 , wherein at least two proposals of the multiple proposals comprise subsets of the plurality of frames in the video that do not overlap. 15. A computer program product for identifying a video segment that correlates to a query, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer-readable program code executable by one or more computer processors to perform an operation, the operation comprising: receiving the query, the query describing an aspect in a video comprising a plurality of frames; identifying multiple proposals that potentially correspond to the query, wherein each of the proposals comprises a subset of the plurality of frames; generating a graph based on the query and the multiple proposals; ranking the proposals using a GCN that identifies relationships between the proposals, wherein the graph is input into the graph convolution network; and selecting, based on the ranking, one of the proposals as the video segment that correlates to the query. 16. The computer program product of claim 15 , wherein generating the graph comprises: identifying visual features in the proposals using a visual feature encoder; and generating query features from the query using a recurrent neural network (RNN). 17. The computer program product of claim 16 , wherein the graph comprises nodes and edges based on the visual features and the query features. 18. The computer program product of claim 17 , wherein ranking the proposals comprises: updating node features for the nodes in the graph; and calculating edge weights for the edges in the graph. 19. The computer program product of claim 17 , wherein ranking the proposals further comprises: performing node aggregation; and ranking the proposals based on the node aggregation and results from processing the graph using the GCN. 20. The computer program product of claim 15 , wherein at least two proposals of the multiple proposals comprise overlapping frames of the plurality of frames in the video.

Assignees

Inventors

Classifications

  • using objects detected or recognised in the video content · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • based on feedback from supervisors · CPC title

  • using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11442986B2 cover?
Method and apparatus that includes receiving a query describing an aspect in a video, the video including a plurality of frames, identifying multiple proposals that potentially correspond to the query where each of the proposals includes a subset of the plurality of frames, ranking the proposals using a graph convolution network that identifies relationships between the proposals, and selecting…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/7837. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).