Spatial motion attention for intelligent video analytics

US12412283B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12412283-B2
Application numberUS-202217959713-A
CountryUS
Kind codeB2
Filing dateOct 4, 2022
Priority dateOct 8, 2021
Publication dateSep 9, 2025
Grant dateSep 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for spatial motion attention for intelligent video analytics. One of the methods includes: obtaining an input image of a region; generating a motion image that characterizes a difference between a value of a pixel at the pixel location in the input image and a value of a pixel at the pixel location in the reference image; generating a feature map using the input image; generating, using the motion image and the feature map, a motion enhanced feature map that has, for one or more pixels that likely indicate movement, a first value that a) indicates that the corresponding pixel in the motion enhanced feature map likely indicates movement and b) is different from a second value for a corresponding pixel in the feature map; and analyzing the motion enhanced feature map.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: obtaining an input image of a geographic region; generating, using the input image and a reference image for the geographic region, a motion image that characterizes, for one or more pixel locations in the input image and the reference image, a difference between a value of a pixel at the pixel location in the input image and a value of a pixel at the pixel location in the reference image; generating a feature map using the input image; generating, using the motion image and the feature map, a motion enhanced feature map that has, for one or more pixels that likely indicate movement, a first value that a) indicates whether the corresponding pixel in the motion enhanced feature map likely indicates movement and b) is different from a second value for a corresponding pixel in the feature map, wherein, for the one or more pixels that likely indicate movement: the first value of the pixel in the motion enhanced feature map is a weighted value of the second value for the corresponding pixel in the feature map, and the first value is weighted higher for a pixel that likely indicates movement than another value for one or more other pixels that likely indicate non-movement; and analyzing, as part of an object analysis process, the motion enhanced feature map. 2. The method of claim 1 , wherein generating the motion enhanced feature map comprises: generating a modified motion image from the motion image, wherein the modified motion image has a different resolution from the motion image; generating a spatial motion attention map from the modified motion image; and generating the motion enhanced feature map using the spatial motion attention map and the feature map. 3. The method of claim 2 , wherein: the modified motion image is a downsampled motion image; and generating the modified motion image comprises generating the modified motion image by downsampling the motion image with a kernel size such that dimensions of the downsampled motion image are the same as the feature map. 4. The method of claim 2 , wherein: the modified motion image is a pooled motion image; and generating the modified motion image comprises generating the modified motion image by pooling the motion image with a kernel size such that dimensions of the pooled motion image are the same as the feature map. 5. The method of claim 2 , wherein generating the modified motion image comprises: generating a downsampled motion image by downsampling the motion image; and generating the modified motion image from the downsampled motion image using a convolutional neural network block. 6. The method of claim 2 , wherein generating the modified motion image comprises: generating a pooled motion image by pooling the motion image; and generating the modified motion image from the pooled motion image using a convolutional neural network block. 7. The method of claim 2 , wherein the spatial motion attention map comprises pixels with values that represent weights to be applied to corresponding pixels in the feature map. 8. The method of claim 2 , wherein generating the motion enhanced feature map from the spatial motion attention map and the feature map comprises: generating, using the spatial motion attention map and the feature map, a motion modulated feature map that has, for one or more second pixel locations in the spatial motion attention map and the pixel location in the feature map, a value generated by combining a value of a pixel at the pixel location in the spatial motion attention map and a value of a pixel at the pixel location in the feature map; and generating the motion enhanced feature map using at least the motion modulated feature map. 9. The method of claim 1 , comprising: generating an aggregated motion enhanced feature map using two or more spatial motion modulators and one or more convolutional neural network blocks. 10. The method of claim 1 , wherein analyzing the motion enhanced feature map comprises analyzing the motion enhanced feature map using at least one of an object classifier, an object detector, an object tracker, or a panoptic segmenter. 11. A system comprising one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining an input image of a geographic region; generating, using the input image and a reference image for the geographic region, a motion image that characterizes, for one or more pixel locations in the input image and the reference image, a difference between a value of a pixel at the pixel location in the input image and a value of a pixel at the pixel location in the reference image; generating a feature map using the input image; generating, using the motion image and the feature map, a motion enhanced feature map that has, for one or more pixels that likely indicate movement, a first value that a) indicates whether the corresponding pixel in the motion enhanced feature map likely indicates movement and b) is different from a second value for a corresponding pixel in the feature map, wherein, for the one or more pixels that likely indicate movement: the first value of the pixel in the motion enhanced feature map is a weighted value of the second value for the corresponding pixel in the feature map, and the first value is weighted higher for a pixel that likely indicates movement than another value for one or more other pixels that likely indicate non-movement; and analyzing, as part of an object analysis process, the motion enhanced feature map. 12. The system of claim 11 , wherein generating the motion enhanced feature map comprises: generating a modified motion image from the motion image, wherein the modified motion image has a different resolution from the motion image; generating a spatial motion attention map from the modified motion image; and generating the motion enhanced feature map using the spatial motion attention map and the feature map. 13. The system of claim 12 , wherein: the modified motion image is a downsampled motion image; and generating the modified motion image comprises generating the modified motion image by downsampling the motion image with a kernel size such that dimensions of the downsampled motion image are the same as the feature map. 14. The system of claim 12 , wherein: the modified motion image is a pooled motion image; and generating the modified motion image comprises generating the modified motion image by pooling the motion image with a kernel size such that dimensions of the pooled motion image are the same as the feature map. 15. The system of claim 12 , wherein generating the modified motion image comprises: generating a downsampled motion image by downsampling the motion image; and generating the modified motion image from the downsampled motion image using a convolutional neural network block. 16. The system of claim 12 , wherein generating the modified motion image comprises: generating a pooled motion image by pooling the motion image; and generating the modified motion image from the pooled motion image using a convolutional neural network block. 17. The system of claim 12 , wherein the spatial motion attention map comprises pixels with values that represent weights to be applied to corresponding pixels in the feature map. 18. The system of claim 12 , wherein generating the motion enhanced feature map from the spatial m

Assignees

Inventors

Classifications

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Training; Learning · CPC title

  • Artificial neural networks [ANN] · CPC title

  • Video; Image sequence · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12412283B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for spatial motion attention for intelligent video analytics. One of the methods includes: obtaining an input image of a region; generating a motion image that characterizes a difference between a value of a pixel at the pixel location in the input image and a value of a pixel at the pixel location i…
Who is the assignee on this patent?
Objectvideo Labs Llc
What technology area does this patent fall under?
Primary CPC classification G06T7/248. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).