Segmentation and tracking system and method based on self-learning using video patterns in video
US-2022121853-A1 · Apr 21, 2022 · US
US12192543B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12192543-B2 |
| Application number | US-202318393664-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 21, 2023 |
| Priority date | Jun 3, 2022 |
| Publication date | Jan 7, 2025 |
| Grant date | Jan 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to: receive a video stream comprising a plurality of video frames; group the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; based on at least the set of historical video frames and the set of present video frames, generate an action prediction for the current video frame; and perform background suppression, wherein the action prediction comprises a confidence and wherein performing the background suppression comprises: generating a loss function that weights low confidence video frames more heavily, with separate emphasis on action and background classes, for a classifier that generates the action prediction. 2. The system of claim 1 , wherein the instructions are further operative to: determine a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; and weight the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames, wherein generating the action prediction for the current video frame is based on at least the set of weighted historical video frames and the set of present video frames. 3. The system of claim 2 , wherein determining the set of attention weights comprises: determining, for each video frame of the set of historical video frames, a position-guided gating score. 4. The system of claim 1 , wherein the instructions are further operative to: based on at least the action prediction for the current video frame, generate an annotation for the current video frame; and display the current video frame subject to the annotation for the current video frame. 5. The system of claim 1 , wherein the action prediction comprises a no action prediction or an action class prediction selected from a plurality of action classes. 6. The system of claim 1 , wherein the instructions are further operative to: based on at least the set of historical video frames and the set of present video frames, generate a future action prediction for a video frame not yet observed. 7. The system of claim 6 , wherein the future action prediction is based on at least a predicted trajectory of an autonomous driving vehicle. 8. A computerized method comprising: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; based on at least the set of historical video frames and the set of present video frames, generating an action prediction for the current video frame; and performing background suppression, wherein the action prediction comprises a confidence and wherein performing the background suppression comprises: generating a loss function that weights low confidence video frames more heavily, with separate emphasis on action and background classes, for a classifier that generates the action prediction. 9. The method of claim 8 , further comprising: determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; and weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames, wherein generating the action prediction for the current video frame is based on at least the set of weighted historical video frames and the set of present video frames. 10. The method of claim 9 , wherein determining the set of attention weights comprises: determining, for each video frame of the set of historical video frames, a position-guided gating score. 11. The method of claim 8 , further comprising: based on at least the action prediction for the current video frame, generating an annotation for the current video frame; and displaying the current video frame subject to the annotation for the current video frame. 12. The method of claim 8 , wherein the action prediction comprises a no action prediction or an action class prediction selected from a plurality of action classes. 13. The method of claim 8 , further comprising: based on at least the set of historical video frames and the set of present video frames, generating a future action prediction for a video frame not yet observed. 14. The method of claim 13 , wherein the future action prediction is based on at least a predicted trajectory of an autonomous driving vehicle. 15. One or more computer storage devices having computer-executable instructions stored thereon, which, on execution by a computer, cause the computer to perform operations comprising: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; based on at least the set of historical video frames and the set of present video frames, generating an action prediction for the current video frame; and performing background suppression, wherein the action prediction comprises a confidence and wherein performing the background suppression comprises: generating a loss function that weights low confidence video frames more heavily, with separate emphasis on action and background classes, for a classifier that generates the action prediction. 16. The one or more computer storage devices of claim 15 , wherein the operations further comprise: determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; and weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames, wherein generating the action prediction for the current video frame is based on at least the set of weighted historical video frames and the set of present video frames. 17. The one or more computer storage devices of claim 16 , wherein determining the set of attention weights comprises: determining, for each video frame of the set of historical video frames, a position-guided gating score. 18. The one or more computer storage devices of claim 16 , wherein the operations further comprise: based on at least the action prediction for the current video frame, generating an annotation for the current video frame; and displaying the current video frame subject to the annotation for the current video frame. 19. The one or more computer storage devices of claim 15 , wherein the action prediction comprises a no action prediction or an action class prediction selected from a plurality of action classes. 20. The one or more computer storage devices of claim 15 , wherein the operations further comprise: based on at least the set of historical video frames and the set of present video frames, generating a future action prediction for a video frame not yet observed.
Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title
Recognition of whole body movements, e.g. for sport training · CPC title
using neural networks · CPC title
of news video content · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.