Natural language object tracking
US-2018129742-A1 · May 10, 2018 · US
US10769480B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10769480-B2 |
| Application number | US-201816113409-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 27, 2018 |
| Priority date | Aug 29, 2017 |
| Publication date | Sep 8, 2020 |
| Grant date | Sep 8, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An object detection method and a neural network system for object detection are disclosed. The object detection method acquires a current frame of a sequence of frames representing an image sequence, and extracts a feature map of the current frame. The extracted feature map is pooled with information of a pooled feature map of a previous frame to thereby obtain a pooled feature map of the current frame. An object is detected from the pooled feature map of the current frame. A dynamic vision sensor (DVS) may be utilized to provide the sequence of frames. Improved object detection accuracy may be realized, particularly when object movement speed is slow.
Opening claim text (preview).
What is claimed is: 1. An object detection method, comprising: executing, by at least one processor, operations comprising: acquiring a current frame of a sequence of event frames representing an image sequence and obtained from an event camera; extracting a feature map of the current frame; pooling the feature map of the current frame with information of a pooled feature map of a previous frame to thereby obtain a pooled feature map of the current frame; and detecting an object from the pooled feature map of the current frame,. wherein, the at least one processor is part of a neural network system that determines object movement speed based on the event frames; the object detection method further comprising: providing the event frames to a long and short term memory (LSTM) network or a sequence non-maximum suppression (Seq-NMS) network; and using object detection results of the LSTM or Seq-NMS network as a final object detection result if the object movement speed is above a threshold, otherwise using the object detection of the at least one processor of the neural network system as the final object detection result. 2. The method of claim 1 , further comprising: acquiring an initial frame and a second frame of the frame sequence prior to acquiring the current frame; extracting a feature map of the initial frame; extracting a feature map of the second frame; and pooling the feature map of the second frame with information of the feature map of the initial frame to thereby obtain a pooled feature map of the second frame. 3. The method of claim 1 , wherein the pooling of the feature map of the current frame to obtain the pooled feature map of the current frame comprises: obtaining a weight image of the current frame using the pooled feature map of the previous frame and the feature map of the current frame; and obtaining the pooled feature map of the current frame using the pooled feature map of the previous frame, the feature map of the current frame, and the weight image of the current frame. 4. The method of claim 3 , wherein when the current frame is a t th frame, where t is an integer larger than 1 , respective weight values in the weight image of the current frame are calculated through the following equation: ω t,i =N mlp ( f t-1,i ,x t,i ) wherein ω t,i indicates a weight value of a coordinate “i” in a weight image of a t th frame, and a value range of ω t,i is [0, 1], N mlp is a multilayer perceptron neural network, and f t-1,i and x t,i are inputs to the multilayer perceptron neural network, wherein f t-1,i is a pooled feature value of the coordinate i in a pooled feature map of a (t-1) th frame, x t,i is a feature value of the coordinate i in a feature map of the t th frame, and i represents a two-dimensional coordinate of an image element. 5. The method of claim 4 , wherein respective pooled feature values in a pooled feature map of the t th frame are calculated through the following equation: f t,i =ρ[ s ( f t-1,i ,ω t,i ), x t,i ] where f t,i is a pooled feature value of the coordinate i in the pooled feature map of the t th frame, a function s is used for multiplying f t-1,i and ω t,1 , a function ρ is a pooled function and is used for determining a maximum value among a range of values calculated through the function s and x t,i or is used for calculating an average value of the value calculated through the function s and x t,i . 6. The method of claim 1 , wherein the pooled feature map of the current frame is obtained by recursive application of a plurality of pooled feature maps of respective previous frames. 7. The method of claim 1 , wherein the sequence of frames is obtained through image capture by a dynamic vision sensor camera. 8. The method of claim 1 , wherein extracting the feature map of the current frame comprises: obtaining the feature map of the current frame by performing a convolution operation on the current frame a predetermined number of times. 9. A system for object detection, comprising: a first neural network system comprising: a feature extraction subnetwork configured to acquire a current frame of a sequence of frames representing an image sequence and extract a feature map of the current frame; a time domain pooling subnetwork configured to pool the feature map of the current frame with information of a pooled feature map of a previous frame to thereby obtain a pooled feature map of the current frame; and a detection subnetwork configured to detect an object from the pooled feature map of the current frame; a second neural network system configured to detect objects from the sequence of frames, the second neural network system comprising a long and short term memory (LSTM) network or a sequence non-maximum suppression (Seq-NMS) network; wherein the first or second neural network system is configured to determine object movement speed, and if the object movement speed is above a threshold, object detection results for the second network system are used for a final object detection result, and if the object movement speed is below the threshold, object detection results for the first neural network system are used for the final object detection result. 10. The system of claim 9 , wherein the feature extraction subnetwork is further configured to: acquire an initial frame and a second frame of the frame sequence prior to acquiring the current frame; extract a feature map of the initial frame; and extract a feature map of the second frame; and the time domain pooling subnetwork is further configured to pool the feature map of the second frame with information of the feature map of the initial frame to thereby obtain a pooled feature map of the second frame. 11. The system of claim 9 , wherein the time domain pooling subnetwork is configured to obtain a weight image of the current frame using the pooled feature map of the previous frame and the feature map of the current frame, and obtain the pooled feature map of the current frame using the pooled feature map of the previous frame, the feature map of the current frame, and the weight image of the current frame. 12. The system of claim 11 , wherein when the current frame image is a t th frame image, where t is an integer larger than 1 , respective weight values in the weight image of the current frame are calculated through the following equation: ω t,i =N mlp ( f t-1,i , x t,i ) wherein ω t,i is a weight value of a coordinate “i” in a weight image of a t th frame, and a value range of ω t,i is [0, 1], N, mlp is a multilayer perceptron neural network, and f t-1,i and x t,i are inputs of the multilayer perceptron neural network, wherein f t-1,i is a pooled feature value of the coordinate i in a pooled feature map of a (t-1) th frame, x t,i is a feature value of the coordinate i in a feature map of the t th frame, where i represents a two-dimensional coordinate of an image element. 13. The system of claim 12 , wherein the time domain pooling subnetwork is configured to calculate respective pooled feature values in a pooled feature map of the t th frame through the following equation: f t,i =ρ[ s ( f t-1,i ,ω t,i ), x t,i ] wherein f t,i indicates a pooled feature value of the coordinate i in the pooled feature map of the t th frame, a function s is used for multiplying f t-1,i and ω t,i , a function ρ is a pooled function and is used for determining a maximum value among a range of values calculated through the function s and x t,i or is used for calculating an average value of the value calculated through the function s and x t,i .
Related publications grouped by family.
Answers are generated from the same data shown on this page.