Self-learning object detectors for unlabeled videos using multi-task learning
US-2015248586-A1 · Sep 3, 2015 · US
US9697439B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9697439-B2 |
| Application number | US-201414505031-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 2, 2014 |
| Priority date | Oct 2, 2014 |
| Publication date | Jul 4, 2017 |
| Grant date | Jul 4, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An object detection method includes for each of a set of patches of an image, encoding features of the patch with a non-linear mapping function, and computing per-patch statistics based on the encoded features for approximating a window-level non-linear operation by a patch-level operation. Then, windows are extracted from the image, each window comprising a sub-set of the set of patches. Each of the windows is scored based on the computed patch statistics of the respective sub-set of patches. Objects, if any, can then be detected in the image, based on the window scores. The method and system allow the non-linear operations to be performed only at the patch level, reducing the computation time of the method, since there are generally many more windows than patches, while not impacting performance unduly, as compared to a system which performs non-linear operations at the window level.
Opening claim text (preview).
What is claimed is: 1. An object detection method comprising: for each of a set of patches, encoding features of each of the patch with a non-linear mapping function; for each patch in the set, computing first and second scalar patch statistics with non-linear operations on the encoded features for approximating a window-level non-linear operation with the patch-level non-linear operations; storing the computed scalar patch statistics; extracting windows from the image, each window comprising a sub-set of the set of patches; scoring each of the windows with a linear function of the stored computed scalar patch statistics of the respective sub-set of patches; and providing for detecting objects in the image based on the window scores, wherein at least one of the encoding patch features, computing patch statistics, extracting windows, scoring the windows, and detection of objects is performed with a processor. 2. The method of claim 1 , wherein the encoding of features comprises: computing a patch descriptor for each of a set of patches of an image; and encoding each of the patch descriptors with a non-linear mapping function. 3. The method of claim 2 , wherein the patch encoding comprises computing a likelihood that the descriptor is emitted by a generative model. 4. The method of claim 3 , wherein the patch encoding comprises a Fisher Vector. 5. The method of claim 1 , wherein the computing of the patch statistics includes performing a non-linear operation on each of the encoded features. 6. The method of claim 5 , wherein the performing of the non-linear operation comprises computing an l 2 -normalization of each of the encoded features. 7. The method of claim 1 , wherein the computed patch statistics include: a statistic which reflects a contribution of the patch to the score of the window for a given target class to be detected; and a statistic which normalizes the encoded patch features. 8. The method of claim 1 , wherein the computed patch statistics include: a norm of the encoded patch features or a function thereof; and a weighted function of the encoded patch features in which weights are weights of a linear classifier trained to score window representations. 9. An object detection method, comprising: for each of a set of patches, encoding features of each of the patch with a non-linear mapping function; computing patch statistics on the encoded features for approximating a window-level non-linear operation with a set of patch-level operations; extracting windows from the image, each window comprising a sub-set of the set of patches; scoring each of the windows based on the computed patch statistics of the respective sub-set of patches; and providing for detecting objects in the image based on the window scores, wherein the patch statistics are of the form: {circumflex over (ψ)}( x k )=( w T φ( x k ),∥φ( x k )∥ 2 2 ) (7) wherein w comprises a vector of weights of a classifier function for classifying a window representation with respect to a selected class; T represents the transpose operator; φ(x k ) represents an encoded patch descriptor which encodes the patch features with a non-linear mapping function; and ∥φ(x k )∥ 2 2 represents the l 2 -norm of the encoded patch descriptor, wherein at least one of the encoding patch features, computing patch statistics, extracting windows, scoring the windows, and detection of objects is performed with a processor. 10. The method of claim 9 , wherein the scoring each of the windows based on the respective window representation comprises computing a score ŝ(χ) for the window representation as a function of: ∑ i = 1 K w T φ ( x i ) ∑ i = 1 K φ ( x i ) 2 2 where K represents the set of patches in the window; and x i represents one of the K patches in the window. 11. An object detection method, comprising: for each of a set of patches, encoding features of each of the patch with a non-linear mapping function; computing patch statistics on the encoded features for approximating a window-level non-linear operation with a set of patch-level operations; extracting windows from the image, each window comprising a sub-set of the set of patches; scoring each of the windows based on the computed patch statistics of the respective sub-set of patches, wherein the scoring of each of the windows employs integral images for pooling of the weighted encoded patch features, wherein when integral images are used, the scoring of each of the windows comprises four look-up operations on an integral image H: {circumflex over ( s )}(χ)={tilde over ( g )}( H ( x 0 ,y 0 )+ H ( x 1 ,y 1 )− H ( x 0 ,y 1 )− H ( x 1 ,y 0 ))+ b, (10) where {tilde over (g)}(u, v)=u/√{square root over (v)}, (x 0 , y 0 ) are the coordinates of the upper left corner of window χ, and (x 1 , y 1 ) are the coordinates of the lower right corner of window χ, and providing for detecting objects in the image based on the window scores, wherein at least one of the encoding patch features, computing patch statistics, extracting windows, scoring the windows, and detection of objects is performed with a processor. 12. The method of claim 11 , wherein the method includes generating a data structure H that, for any location (x, y) in image , stores the cumulative sums of all the patch statistics {circumflex over (ψ)}(x) for all the patches x above and to the left of (x, y): H ( x,y )=Σ xε (x,y) {circumflex over (ψ)}( x ), (9) where (x, y) is the restriction of to the set of patches above and to the left of (x, y). 13. The method of claim 1 , wherein the scoring of each of the windows employs integral images and the scoring of each of the windows comprises four look-up operations on the integral image H: {circumflex over ( s )}(χ)={tilde over ( g )}( H ( x 0 ,y 0 )+ H ( x 1 ,y 1 )− H ( x 0 ,y 1 )− H ( x 1 ,y 0 ))+ b, (10) where {tilde over (g)}(u, v)=u/√{square root over (v)}, (x 0 , y 0 ) are the coordinates of the upper left corner of window χ, and (x 1 , y 1 ) are the coordinates of the lower right corner of
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Involving statistics of pixels or of feature values, e.g. histogram matching · CPC title
by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.