Attention masks in neural network video processing

US11568543B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11568543-B2
Application numberUS-202117197908-A
CountryUS
Kind codeB2
Filing dateMar 10, 2021
Priority dateMar 10, 2021
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device configured for more efficiently processing video images within a set of video image data to detect objects is described herein. The device may include a processor configured to execute a neural network such as a convolutional neural network. The device can receive video image data from a plurality of cameras, such as stationary cameras. The device can acquire a set of sample images from a stationary camera and submit them to a specialized neural network for processing to generate an attention mask. The attention mask can be generated from a variety of methods and is applied to each of the subsequently acquired images form the camera to narrow down areas where the convolutional neural network should process data. The application of attention masks to images within video image data creates masked images that can be processed to detect objects with much greater accuracy and fewer computational resources required.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: a processor configured to process video images for object detection by executing a convolutional neural network, the processor being further configured to: receive video image data comprising a series of images for processing; and use a pre-generated attention mask to indicate where processing should occur within the series of images, wherein: the pre-generated attention mask is generated based on a training set of video image data that is processed by a convolutional neural network specialized to output detected object data; and the video image data is pre-processed with the pre-generated attention mask to generate a series of pre-processed masked images by applying the pre-generated attention mask to the video image data; and wherein the neural network is configured to: process the series of pre-processed masked images within areas indicated by the pre-generated attention mask; and generate an output for the series of pre-processed masked images, the output corresponding to the detection of one or more pre-determined objects within the masked images. 2. The device of claim 1 , wherein the detected object data output is a bounding box of the detected object. 3. The device of claim 1 , wherein the detected object data output is a pixel-level segmentation of the detected object. 4. The device of claim 1 wherein the specialized convolutional neural network further outputs semantic region segmentation data. 5. The device of claim 1 , wherein the detected object data output is utilized to update histogram data relating to the location of the detected objects. 6. The device of claim 5 , wherein the histogram data is utilized to generate an attention mask for applying to subsequent video image data. 7. The device of claim 6 , wherein the histogram data is a two-dimensional histogram corresponding to the dimensions of the images within the video image data. 8. The device of claim 7 , wherein the histogram data is utilized to generate a binary output for each pixel within the images within the video image data. 9. The device of claim 8 , wherein the binary output values are generated in relation to a pre-determined threshold value. 10. The device of claim 9 , wherein the pre-determined threshold value is dynamically changed based on a semantic segmentation region generated from the specialized convolutional neural network output. 11. The device of claim 1 , wherein the generation of the attention mask is performed within an external training server communicatively coupled to the device. 12. The device of claim 1 , wherein the received video image data is acquired from a stationary camera, and in response to the movement of the stationary camera, a request for a new attention mask is generated. 13. The device of claim 1 , wherein in response to a pre-determined time threshold being exceeded, the device requests the generation of a new attention mask. 14. The device of claim 1 , wherein the detection of one or more pre-determined objects within the masked images within the video image data generates a notification that further analysis is required. 15. A method of detecting pre-determined objects within video images, comprising: configuring a neural network to receive a series of images for object detection; receiving a sample set of images as video image data; transferring the received sample set of images to a server configured to generate attention masks by processing the received sample set of images through a convolutional neural network specialized to output detected object data; receiving a generated attention mask configured for use with the series of images received from a stationary camera; applying the attention mask to the series of images within the video image data received from the stationary camera to generate a series of masked images; and processing the masked images within the neural network to generate an output indicating the presence of one or more pre-determined objects. 16. The method of claim 15 , wherein the server is a training server, and wherein the transferring to the training server also includes the transmission of configuration data. 17. The method of claim 16 , wherein the configuration data includes threshold value derivation parameters. 18. The method of claim 16 , wherein the training server is selected based on the type of object selected for detection. 19. A device comprising: a processor configured to process and detect objects within video images, by executing a neural network and further comprising: a series of video image data for processing; an attention mask generated based on a training set of video image data processed by a convolutional neural network specialized to output detected object data, wherein an image tensor of the video image data is pre-processed with the attention mask to generate a series of pre-processed masked images; and wherein the neural network is configured to process the series of pre-processed masked images and generate an output for the series of masked images, the output corresponding to the detection of one or more objects within the image data.

Assignees

Inventors

Classifications

  • structured as a network, e.g. client-server architectures · CPC title

  • in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title

  • Physics · mapped topic

  • Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns · CPC title

  • G06T7/10Primary

    Segmentation; Edge detection (motion-based segmentation G06T7/215) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568543B2 cover?
A device configured for more efficiently processing video images within a set of video image data to detect objects is described herein. The device may include a processor configured to execute a neural network such as a convolutional neural network. The device can receive video image data from a plurality of cameras, such as stationary cameras. The device can acquire a set of sample images fro…
Who is the assignee on this patent?
Western Digital Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).