What technology area does this patent fall under?

Primary CPC classification G06V20/49. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 07 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Temporally distributed neural networks for video semantic segmentation

US11354906B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11354906-B2
Application number	US-202016846544-A
Country	US
Kind code	B2
Filing date	Apr 13, 2020
Priority date	Apr 13, 2020
Publication date	Jun 7, 2022
Grant date	Jun 7, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: extracting, from each video frame in a contiguous sequence of video frames, a group of features using one of a plurality of sub-neural networks, the contiguous sequence of video frames comprising a current video frame and one or more additional video frames occurring in the contiguous sequence prior to the current video frame, wherein the group of features extracted from the current video frame is different from another group of features extracted from the one or more additional video frames in the contiguous sequence of video frames; generating a full feature representation for the current video frame by combining the groups of features extracted from the contiguous sequence of video frames, wherein generating the full feature representation for the current video frame comprises: generating, for each video frame in the one or more additional video frames, an affinity value between pixels of the video frame in the one or more additional video frames and the current video frame; and generating the full feature representation for the current video frame based on the affinity value and the groups of features extracted from the contiguous sequence of video frames; segmenting the current video frame based upon the full feature representation to generate a segmentation result, the segmentation result comprising information identifying, for a pixel in the current video frame, a label selected for the pixel based upon the full feature representation, wherein the label is selected from a plurality of labels; and outputting the segmentation result. 2. The method of claim 1 , wherein the groups of features, extracted from the video frames in the contiguous sequence of video frames, together represent a total set of features used for segmenting the current video frame. 3. The method of claim 1 , wherein the plurality of sub-neural networks comprises a first sub-neural network and a second sub-neural network, the first sub-neural network trained to extract a first group of features from a first video frame in the contiguous sequence of video frames, the second sub-neural network trained to extract a second group of features from a second video frame in the contiguous sequence of video frames, wherein the first video frame is different from the second video frame and the first group of features is different from the second group of features. 4. The method of claim 1 , wherein extracting, from each video frame in the contiguous sequence of video frames, a group of features using a different one of the plurality of sub-neural networks comprises: generating at least one of a value feature map, a query map, or a key map, wherein the value feature map comprises features extracted by a sub-neural network of the plurality of sub-neural networks from the video frame, and the query map and the key map comprise information related to correlations between pixels across the video frames or across adjacent video frames in the contiguous sequence. 5. The method of claim 1 , wherein generating the full feature representation for the current video frame further comprises computing a correlation between pixels of a first video frame in the contiguous sequence and a second video frame in the contiguous sequence, where the first video frame is adjacent to the second video frame in the contiguous sequence and occurs before the second video frame in the contiguous sequence. 6. The method of claim 5 , wherein generating the full feature representation for the current video frame further comprises: (a) comparing the first video frame in the contiguous sequence with the second video frame in the contiguous sequence by computing an attention value between the pixels of the first video frame and the pixels of the second video frame, wherein the attention value measures the correlation between the pixels of the first video frame and the pixels of the second video frame; (b) obtaining a value feature map of the first video frame and a value feature map of the second video frame; and (c) updating the value feature map of the second video frame based on the attention value, the Value feature map of the first video frame and the value feature map of the second video frame. 7. The method of claim 1 , further comprising: (a) comparing a first video frame in the contiguous sequence with a second video frame in the contiguous sequence by computing an attention value between pixels of the first video frame and pixels of the second video frame, wherein the attention value measures a correlation between the pixels of the first video frame and the pixels of the second video frame; (b) obtaining a value feature map of the first video frame and a value feature map of the second video frame; (c) updating the value feature map of the second video frame based on the attention value, the value feature map of the first video frame and the value feature map of the second video frame; (d) updating the contiguous sequence of video frames by removing the first video frame from the contiguous sequence of video frames; and repeating (a), (b), (c) and (d) until only the current video frame is left in the contiguous sequence of video frames. 8. The method of claim 7 , further comprising: determining that only the current video frame is left in the contiguous sequence of video frames; and based on the determining, outputting the value feature map for the current video frame, wherein the value feature map represents the full feature representation for the current video frame. 9. The method of claim 1 , wherein the segmentation result comprises an image of the current video frame, wherein each pixel in the image of the current video frame is colored using a color corresponding to the label associated with the pixel. 10. The method of claim 1 , a feature space representing a plurality of features to be used for segmenting video frames in the contiguous sequence of video frames is divided into a number of groups of features, wherein a number of sub-neural networks in the plurality of sub-neural networks is equal to a number of the groups of features. 11. The method of claim 10 , wherein the number of groups of features is four. 12. The method of claim 1 , wherein a number of layers in each sub-neural network from the plurality of sub-neural networks is the same. 13. The method of claim 12 , wherein: a number of layers in each sub-neural network from the plurality of sub-neural networks is the same; and a number of nodes in each sub-neural network from the plurality of sub-neural networks is the same. 14. A system comprising: a memory storing segmented video frames corresponding to a video signal; and one or more processors configured to perform processing comprising: extracting, from each video frame in a contiguous sequence of video frames, a group of features using one of a plurality of sub-neural networks, the contiguous sequence of video frames comprising a current video frame and one or more additional video frames occurring in the contiguous sequence prior to the current video frame, and wherein the group of features extracted from the current video frame is different from another group of features extracted from the one or more additional video frames in the contiguous sequence of video frames; generating a full feature representation for the current video frame by combining the groups of features extracted from the contiguous sequence of video frames, wherein generating the full feature representation for the current video frame comprises: generating, for each video frame in the one or more additional video frames, an affinity valu

Assignees

Adobe Inc

Inventors

Classifications

G06V10/806
of extracted features · CPC title
G06V20/46
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
G06V20/49Primary
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
G06T7/11Primary
Region-based segmentation · CPC title
G06F18/253
of extracted features · CPC title

Patent family

Related publications grouped by family.

View patent family 78007310

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11354906B2 cover?: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted featu…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06V20/49. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 07 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

High fidelity interactive segmentation for video data with deep convolutional tessellations and context aware skip connections

Method and apparatus for image segmentation using an event sensor

Deep learning for dense semantic segmentation in video with automated interactivity and improved temporal coherence

Video classification method, information processing method, and server

Augmenting reality using semantic segmentation

Frequently asked questions