Depth aware enhancement for stereo video
US-2015254811-A1 · Sep 10, 2015 · US
US10820009B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10820009-B2 |
| Application number | US-201816104646-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 17, 2018 |
| Priority date | Jul 8, 2014 |
| Publication date | Oct 27, 2020 |
| Grant date | Oct 27, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Frame sequences from multiple image sensors may be combined in order to form, for example, an interleaved frame sequence. Individual frames of the combined sequence may be configured a by combination (e.g., concatenation) of frames from one or more source sequences. The interleaved/concatenated frame sequence may be encoded using a motion estimation encoder. Output of the video encoder may be processed (e.g., parsed) in order to extract motion information present in the encoded video. The motion information may be utilized in order to determine a depth of visual scene, such as by using binocular disparity between two or more images by an adaptive controller in order to detect one or more objects salient to a given task. In one variant, depth information is utilized during control and operation of mobile robotic devices.
Opening claim text (preview).
What is claimed: 1. A method for motion detection and distance measurement of at least one target object, comprising: receiving a first image frame from a first imaging camera and a second image frame from a second imaging camera, the first image frame being captured contemporaneous to the second image frame; combining the first and second image frame to create a first combined frame; receiving a third image frame from the first imaging camera and a fourth image frame from the second imaging camera, the third image frame being captured contemporaneous to the fourth image frame and immediately subsequent to the first and second image frames; combining the third and fourth image frames to create a second combined frame; generating an interleaved sequence of concatenated frames to evaluate distance and motion detection; determining the distance measurement based on a pixel-wise disparity between image frames of the first or second combined frames; and determining the motion of at least one target object based on a pixel-wise disparity between the first and second combined frames. 2. The method of claim 1 , further comprising: extracting luminance data of pixels of the image frames within the interleaved sequence of concatenated frames; and generating macroblocks comprising pixels of similar luminance within each of the image frames of the interleaved sequence of concatenated frames. 3. The method of claim 2 , wherein the determining of the distance measurement is based on an at least one pixel disparity between macroblocks of the first and second image frames of a first interleaved sequence of concatenated frames and a spatial separation of first and second imaging cameras. 4. The method of claim 2 , wherein the determining of motion of the at least one target object corresponds to assigning a motion vector to each of the macroblocks within the interleaved sequence of concatenated frames, the motion vectors being assigned based on an at least one pixel disparity between macroblocks of the first combined frame and the second combined frame. 5. The method of claim 4 , further comprising: determining gestures of the at least one target object within the interleaved sequence of concatenated frames by performing at least one of: (i) identifying background pixels within the interleaved sequence of concatenated frames based on spatially coherent motion or differential motion; (ii) removing pixels corresponding to the identified background pixels; and (iii) determining the gesture based on a resulting motion vector field of macroblocks within the first interleaved sequence of concatenated frames, the motion vector field being formed using the image processor by at least one remaining macroblock with an assigned motion vectors upon removal of the background pixels. 6. The method of claim 5 , wherein, the resulting motion vector field of macroblocks is associated with a gesture based on an output from an adaptive predictor apparatus, and the at least one target object comprises at least one of a human, inanimate object, portions of a human, or portions of an inanimate object. 7. The method of claim 1 , wherein, the first and second imaging cameras are separated spatially by a nonzero distance. 8. A non-transitory computer readable medium comprising a plurality of instructions stored thereon, that when executed by at least one processor, configure the at least one processor to, receive a first image frame from a first imaging camera and a second image frame from a second imaging camera, the first image frame being captured contemporaneous to the second image frame; combine the first and second image frames to create a first combined frame; receive a third image frame from the first imaging camera and a fourth image frame from the second imaging camera, the third image frame being captured contemporaneous to the fourth and subsequent to the first and second image frames; combine the third and fourth image frames to create a second combined frame; generate an interleaved sequence of concatenated frames to evaluate distance and motion detection; determine distance measurement based on a pixel-wise disparity between image frames of the first or second combined frames; and determine motion of at least one target object based on the pixel-wise disparity between the first and second combined frames. 9. The non-transitory computer readable medium of claim 8 , wherein the at least one processor is further configured to execute the computer readable instructions to, extract luminance data of pixels of the image frames within the interleaved sequence of concatenated frames; and generate macroblocks comprising pixels of similar luminance within each of the image frames of the interleaved sequence of concatenated frames. 10. The non-transitory computer readable medium of claim 9 , wherein the distance measurement determination is based on an at least one pixel disparity between macroblocks of the first and second image frames of a first interleaved sequence of concatenated frames and a spatial separation of first and second imaging cameras. 11. The non-transitory computer readable medium of claim 9 , wherein the motion of the at least one target object corresponds to assigning a motion vector to each of the macroblocks within the interleaved sequence of concatenated frames, the motion vectors being assigned based on an at least one pixel disparity between macroblocks of the first combined frame and the second combined frame. 12. The non-transitory computer readable medium of claim 11 , wherein the at least one processor is further configured to execute the computer readable instructions to, determine gestures of the at least one target object within the interleaved sequence of concatenated frames by performing at least one of: (i) identifying background pixels within the interleaved sequence of concatenated frames based on spatially coherent motion or differential motion; (ii) removing pixels corresponding to the identified background pixels; and (iii) determining the gesture based on a resulting motion vector field of macroblocks within the first interleaved sequence of concatenated frames, the motion vector field being formed using the image processor by at least one remaining macroblock with an assigned motion vectors upon removal of the background pixels. 13. The non-transitory computer readable medium of claim 12 , wherein, the resulting motion vector field of macroblocks is associated with a gesture based on an output from an adaptive predictor apparatus, and the at least one target object comprises at least one of a human, inanimate object, portions of a human, or portions of an inanimate object. 14. The non-transitory computer readable medium of claim 8 , wherein the first and second imaging cameras are separated spatially by a nonzero distance. 15. A system for motion detection and distance measurement of at least one target object, comprising: a memory having computer readable instructions thereon; and at least one processor configured to execute the computer readable instructions to, receive a first image frame from a first imaging camera and a second image frame from a second imaging camera, the first image frame being captured contemporaneous to the second image frame; combine the first and second image frames to create a first combined frame; receive a third image frame from the first imaging camera and a fourth image frame from the second imaging camera, the third image frame being captured contemporaneous to the fourth and subsequent to the first and second image fames; genera
Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title
Motion estimation or motion compensation · CPC title
from motion · CPC title
Stereoscopic video; Stereoscopic image sequence · CPC title
using a sequence of stereo image pairs · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.