Method and device for identifying video

US11967134B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11967134-B2
Application numberUS-202017611673-A
CountryUS
Kind codeB2
Filing dateMar 19, 2020
Priority dateJun 5, 2019
Publication dateApr 23, 2024
Grant dateApr 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are a method and device for recognizing a video. One specific embodiment of the method comprises: obtaining a video to be identified; inputting said video to a pre-trained local and global representation propagation LGD model to obtain the category of said video, wherein the LGD model learns a spatial-temporal representation in said video based on diffusion between local and global representations. According to this embodiment, the spatial-temporal representation in the video is learned based on diffusion between the local and global representations.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for recognizing a video, comprising: acquiring a to-be-recognized video; and inputting the to-be-recognized video into a pre-trained local and global diffusion (LGD) model to obtain a category of the to-be-recognized video, the LGD model learning a spatio-temporal representation in the to-be-recognized video based on diffusion between a local representation and a global representation. 2. The method according to claim 1 , wherein the LGD model comprises a plurality of cascaded LGD modules, a local and global combination classifier and a fully connected layer. 3. The method according to claim 2 , wherein each LGD module comprises a local path and a global path interacting with each other, respectively describing local variation and holistic appearance at each spatio-temporal location. 4. The method according to claim 3 , wherein diffusion directions in the each LGD module comprise a global-to-local diffusion direction and a local-to-global diffusion direction, wherein, in the global-to-local diffusion direction, a local feature map at a current LGD module is learned based on a local feature map at a preceding LGD module and a global feature vector at the preceding LGD module, and in the local-to-global diffusion direction, a global feature vector at the current LGD module is learned based on the local feature map at the current LGD module and the global feature vector at the preceding LGD module. 5. The method according to claim 4 , wherein learning the local feature map at the current LGD module based on the local feature map at the preceding LGD module and the global feature vector at the preceding LGD module comprises: attaching a residual value of a global path at the preceding LGD module to the local feature map at the preceding LGD module, to generate the local feature map at the current LGD module, wherein learning the global feature vector at the current LGD module based on the local feature map at the current LGD module and the global feature vector at the preceding LGD module comprises: embedding linearly the global feature vector at the preceding LGD module and global average pooling of the local feature map at the current LGD module, to generate the global feature vector at the current LGD module. 6. The method according to claim 5 , wherein the each LGD module generates a local feature map and a global feature vector through at least three projection matrices, and uses a low-rank approximation of each projection matrix to reduce a number of additional parameters of the LGD module. 7. The method according to claim 2 , wherein the inputting the to-be-recognized video into the pre-trained local and global diffusion (LGD) model to obtain the category of the to-be-recognized video comprises: learning the local representation and the global representation of the to-be-recognized video in parallel based on the to-be-recognized video and the plurality of cascaded LGD modules; inputting the local representation and the global representation of the to-be-recognized video into the local and global combination classifier, to synthesize a combined representation of the to-be-recognized video; and inputting the combined representation of the to-be-recognized video into the fully connected layer, to obtain the category of the to-be-recognized video. 8. The method according to claim 7 , wherein the each LGD module is a two-dimensional LGD (LGD-2D) module or a three-dimensional LGD (LGD-3D) module. 9. The method according to claim 8 , wherein the learning the local representation and the global representation of the to-be-recognized video in parallel based on the to-be-recognized video and the plurality of cascaded LGD modules comprises: segmenting the to-be-recognized video into a plurality of to-be-recognized video segments; selecting a plurality of to-be-recognized video frames from the plurality of to-be-recognized video segments; and inputting the plurality of to-be-recognized video frames into a plurality of cascaded LGD-2D modules to learn a local representation and a global representation of the plurality of to-be-recognized video frames in parallel, and using the learned local representation and global representation as the local representation and the global representation of the to-be-recognized video. 10. The method according to claim 9 , wherein selecting at least one to-be-recognized video frame from each to-be-recognized video segment in the plurality of to-be-recognized video segments. 11. The method according to claim 8 , wherein the learning the local representation and the global representation of the to-be-recognized video in parallel based on the to-be-recognized video and the plurality of cascaded LGD modules comprises: segmenting the to-be-recognized video into a plurality of to-be-recognized video segments; and inputting the plurality of to-be-recognized video segments into a plurality of cascaded LGD-3D modules to learn a local representation and a global representation of the plurality of to-be-recognized video segments in parallel, and using the learned local representation and global representation as the local representation and the global representation of the to-be-recognized video. 12. The method according to claim 11 , wherein the plurality of cascaded LGD-3D modules decompose three-dimensional learning into two-dimensional convolutions in a spatial space and one-dimensional operations in a temporal dimension. 13. The method according to claim 2 , wherein the local and global combination classifier is a kernel-based classifier. 14. A server, comprising: one or more processors; and a storage apparatus, configured to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement operations, the operations comprising: acquiring a to-be-recognized video; and inputting the to-be-recognized video into a pre-trained local and global diffusion (LGD) model to obtain a category of the to-be-recognized video, the LGD model learning a spatio-temporal representation in the to-be-recognized video based on diffusion between a local representation and a global representation. 15. A computer readable medium, storing a computer program thereon, wherein the computer program, when executed by a processor, cause the processor to implement operations, the operations comprising: acquiring a to-be-recognized video; and inputting the to-be-recognized video into a pre-trained local and global diffusion (LGD) model to obtain a category of the to-be-recognized video, the LGD model learning a spatio-temporal representation in the to-be-recognized video based on diffusion between a local representation and a global representation. 16. The server according to claim 14 , wherein the LGD model comprises a plurality of cascaded LGD modules, a local and global combination classifier and a fully connected layer. 17. The server according to claim 16 , wherein each LGD module comprises a local path and a global path interacting with each other, respectively describing local variation and holistic appearance at each spatio-temporal location. 18. The server according to claim 17 , wherein diffusion directions in the each LGD module comprise a global-to-local diffusion direction and a local-to-global diffusion direction, wherein, in the global-to-local diffusion direction, a local feature map at a current LGD module is learned based on a local feature map at a preceding LGD module and a global feature vector at the preceding LGD m

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06V10/764Primary

    using classification, e.g. of video objects · CPC title

  • Extraction of image or video features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11967134B2 cover?
Disclosed are a method and device for recognizing a video. One specific embodiment of the method comprises: obtaining a video to be identified; inputting said video to a pre-trained local and global representation propagation LGD model to obtain the category of said video, wherein the LGD model learns a spatial-temporal representation in said video based on diffusion between local and global re…
Who is the assignee on this patent?
Beijing Jingdong Shangke Information Technology Co Ltd, Beijing Jingdong Century Trading Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/764. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).