What technology area does this patent fall under?

Primary CPC classification G06V20/41. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Unsupervised video representation learning

US11816889B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11816889-B2
Application number	US-202117216605-A
Country	US
Kind code	B2
Filing date	Mar 29, 2021
Priority date	Mar 29, 2021
Publication date	Nov 14, 2023
Grant date	Nov 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Unsupervised learning for video classification. One or more features from one or more video clips are extracted using a spatial-temporal encoder. The one or more extracted features are processed, using a video instance discrimination task, to generate a classification label, the classification label indicating whether two of the video clips are from a same video. The one or more extracted features are processed, using a pair-wise speed discrimination task, to generate a comparison label, the comparison label indicating a relative playback speed between two given video clips. A search is performed in a video database for a video that is similar to a given video based on the comparison label.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: extracting, using a spatial-temporal encoder, one or more features from one or more video clips; processing, using a video instance discrimination task, the one or more extracted features to generate a classification label, the classification label indicating whether two of the video clips are from a same video; processing, using a pair-wise speed discrimination task, the one or more extracted features to generate a comparison label, the comparison label indicating a relative playback speed between two given video clips; and searching, in a video database, for a video that is similar to a given video clip in terms of playback speed based on the comparison label generated by the pair-wise speed discrimination task and that is from a same video as the given video clip based on the classification label generated by the video instance discrimination task. 2. The method of claim 1 , wherein the spatial-temporal encoder is based on a spatial-temporal neural network. 3. The method of claim 1 , wherein the video instance discrimination task is based on a model g a of a video instance neural network. 4. The method of claim 3 , the method further comprising training the model g a using a database of training videos and corresponding training video clips to distinguish video clips derived from the same video from video clips derived from different videos. 5. The method of claim 1 , wherein the processing, using the video instance discrimination task, the one or more extracted features further generates a loss a . 6. The method of claim 1 , wherein the pair-wise speed discrimination task is based on a model g b of a pair-wise speed discrimination neural network. 7. The method of claim 6 , the method further comprising training the model g b using a database of training videos and corresponding training video clips to identify a difference in playback speed between two video clips. 8. The method of claim 1 , wherein the processing, using the pair-wise speed discrimination task, the one or more extracted features further generates a loss m . 9. The method of claim 1 , wherein the searching operation is further based on the classification label. 10. An apparatus comprising: a memory; and at least one processor, coupled to said memory, and operative to perform operations of: extracting, using a spatial-temporal encoder, one or more features from one or more video clips; processing, using a video instance discrimination task, the one or more extracted features to generate a classification label, the classification label indicating whether two of the video clips are from a same video; processing, using a pair-wise speed discrimination task, the one or more extracted features to generate a comparison label, the comparison label indicating a relative playback speed between two given video clips; and searching, in a video database, for a video that is similar to a given video clip in terms of playback speed based on the comparison label generated by the pair-wise speed discrimination task and that is from a same video as the given video clip based on the classification label generated by the video instance discrimination task. 11. The apparatus of claim 10 , wherein the spatial-temporal encoder is based on a spatial-temporal neural network. 12. The apparatus of claim 10 , wherein the video instance discrimination task is based on a model g a of a video instance neural network. 13. The apparatus of claim 12 , the operations further comprising training the model g a using a database of training videos and corresponding training video clips to distinguish video clips derived from the same video from video clips derived from different videos. 14. The apparatus of claim 10 , wherein the processing, using the video instance discrimination task, the one or more extracted features further generates a loss a . 15. The apparatus of claim 10 , wherein the pair-wise speed discrimination task is based on a model g b of a pair-wise speed discrimination neural network. 16. The apparatus of claim 15 , the operations further comprising training the model g b using a database of training videos and corresponding training video clips to identify a difference in playback speed between two video clips. 17. The apparatus of claim 10 , wherein the processing, using the pair-wise speed discrimination task, the one or more extracted features further generates a loss m . 18. The apparatus of claim 10 , wherein the searching operation is further based on the classification label. 19. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method of: extracting, using a spatial-temporal encoder, one or more features from one or more video clips; processing, using a video instance discrimination task, the one or more extracted features to generate a classification label, the classification label indicating whether two of the video clips are from a same video; processing, using a pair-wise speed discrimination task, the one or more extracted features to generate a comparison label, the comparison label indicating a relative playback speed between two given video clips; and searching, in a video database, for a video that is similar to a given video clip in terms of playback speed based on the comparison label generated by the pair-wise speed discrimination task and that is from a same video as the given video clip based on the classification label generated by the video instance discrimination task. 20. The computer program product of claim 19 , wherein the searching operation is further based on the classification label.

Assignees

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06V20/41Primary
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
G06F16/735
Filtering based on additional data, e.g. user or group profiles · CPC title
G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

View patent family 81325361

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11816889B2 cover?: Unsupervised learning for video classification. One or more features from one or more video clips are extracted using a spatial-temporal encoder. The one or more extracted features are processed, using a video instance discrimination task, to generate a classification label, the classification label indicating whether two of the video clips are from a same video. The one or more extracted featu…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06V20/41. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and electronic device for determining motion saliency and video playback style in video

Embedding contextual information in an image to assist understanding

Multi-object tracking with generic object proposals

Self-learning object detectors for unlabeled videos using multi-task learning

Frequently asked questions