What technology area does this patent fall under?

Primary CPC classification G06T7/10. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Attention masks in neural network video processing

US11568543B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11568543-B2
Application number	US-202117197908-A
Country	US
Kind code	B2
Filing date	Mar 10, 2021
Priority date	Mar 10, 2021
Publication date	Jan 31, 2023
Grant date	Jan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device configured for more efficiently processing video images within a set of video image data to detect objects is described herein. The device may include a processor configured to execute a neural network such as a convolutional neural network. The device can receive video image data from a plurality of cameras, such as stationary cameras. The device can acquire a set of sample images from a stationary camera and submit them to a specialized neural network for processing to generate an attention mask. The attention mask can be generated from a variety of methods and is applied to each of the subsequently acquired images form the camera to narrow down areas where the convolutional neural network should process data. The application of attention masks to images within video image data creates masked images that can be processed to detect objects with much greater accuracy and fewer computational resources required.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: a processor configured to process video images for object detection by executing a convolutional neural network, the processor being further configured to: receive video image data comprising a series of images for processing; and use a pre-generated attention mask to indicate where processing should occur within the series of images, wherein: the pre-generated attention mask is generated based on a training set of video image data that is processed by a convolutional neural network specialized to output detected object data; and the video image data is pre-processed with the pre-generated attention mask to generate a series of pre-processed masked images by applying the pre-generated attention mask to the video image data; and wherein the neural network is configured to: process the series of pre-processed masked images within areas indicated by the pre-generated attention mask; and generate an output for the series of pre-processed masked images, the output corresponding to the detection of one or more pre-determined objects within the masked images. 2. The device of claim 1 , wherein the detected object data output is a bounding box of the detected object. 3. The device of claim 1 , wherein the detected object data output is a pixel-level segmentation of the detected object. 4. The device of claim 1 wherein the specialized convolutional neural network further outputs semantic region segmentation data. 5. The device of claim 1 , wherein the detected object data output is utilized to update histogram data relating to the location of the detected objects. 6. The device of claim 5 , wherein the histogram data is utilized to generate an attention mask for applying to subsequent video image data. 7. The device of claim 6 , wherein the histogram data is a two-dimensional histogram corresponding to the dimensions of the images within the video image data. 8. The device of claim 7 , wherein the histogram data is utilized to generate a binary output for each pixel within the images within the video image data. 9. The device of claim 8 , wherein the binary output values are generated in relation to a pre-determined threshold value. 10. The device of claim 9 , wherein the pre-determined threshold value is dynamically changed based on a semantic segmentation region generated from the specialized convolutional neural network output. 11. The device of claim 1 , wherein the generation of the attention mask is performed within an external training server communicatively coupled to the device. 12. The device of claim 1 , wherein the received video image data is acquired from a stationary camera, and in response to the movement of the stationary camera, a request for a new attention mask is generated. 13. The device of claim 1 , wherein in response to a pre-determined time threshold being exceeded, the device requests the generation of a new attention mask. 14. The device of claim 1 , wherein the detection of one or more pre-determined objects within the masked images within the video image data generates a notification that further analysis is required. 15. A method of detecting pre-determined objects within video images, comprising: configuring a neural network to receive a series of images for object detection; receiving a sample set of images as video image data; transferring the received sample set of images to a server configured to generate attention masks by processing the received sample set of images through a convolutional neural network specialized to output detected object data; receiving a generated attention mask configured for use with the series of images received from a stationary camera; applying the attention mask to the series of images within the video image data received from the stationary camera to generate a series of masked images; and processing the masked images within the neural network to generate an output indicating the presence of one or more pre-determined objects. 16. The method of claim 15 , wherein the server is a training server, and wherein the transferring to the training server also includes the transmission of configuration data. 17. The method of claim 16 , wherein the configuration data includes threshold value derivation parameters. 18. The method of claim 16 , wherein the training server is selected based on the type of object selected for detection. 19. A device comprising: a processor configured to process and detect objects within video images, by executing a neural network and further comprising: a series of video image data for processing; an attention mask generated based on a training set of video image data processed by a convolutional neural network specialized to output detected object data, wherein an image tensor of the video image data is pre-processed with the attention mask to generate a series of pre-processed masked images; and wherein the neural network is configured to process the series of pre-processed masked images and generate an output for the series of masked images, the output corresponding to the detection of one or more objects within the image data.

Assignees

Western Digital Tech Inc

Inventors

Classifications

G06V10/95
structured as a network, e.g. client-server architectures · CPC title
G06V20/40
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
G06K9/6227
Physics · mapped topic
G06V10/28
Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns · CPC title
G06T7/10Primary
Segmentation; Edge detection (motion-based segmentation G06T7/215) · CPC title

Patent family

Related publications grouped by family.

View patent family 83193771

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568543B2 cover?: A device configured for more efficiently processing video images within a set of video image data to detect objects is described herein. The device may include a processor configured to execute a neural network such as a convolutional neural network. The device can receive video image data from a plurality of cameras, such as stationary cameras. The device can acquire a set of sample images fro…
Who is the assignee on this patent?: Western Digital Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06T7/10. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Unified referring video object segmentation network

Utilizing a large-scale object detector to automatically select objects in digital images

Method for processing a stream of video images

Methods and systems for cnn network adaption and object online tracking

Frequently asked questions