Dynamic hand-gesture-based region of interest localization

US9354711B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9354711-B2
Application numberUS-201414553752-A
CountryUS
Kind codeB2
Filing dateNov 25, 2014
Priority dateSep 30, 2014
Publication dateMay 31, 2016
Grant dateMay 31, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, non-transitory computer-readable medium, and apparatus for localizing a region of interest using a dynamic hand gesture are disclosed. For example, the method captures the ego-centric video containing the dynamic hand gesture, analyzes a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm, analyzes temporally one or more frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture, localizes the region of interest based on the path of the fingertip in the dynamic hand gesture and performs an action based on an object in the region of interest.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for localizing a region of interest in an ego-centric video using a dynamic hand gesture, comprising: capturing, by a processor, the ego-centric video containing the dynamic hand gesture; compensating, by the processor, for a drift of a head-mounted video device during the capturing of the ego-centric video; analyzing, by the processor, a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm; analyzing, by the processor, temporally one or more frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture, wherein the analyzing temporally the one or more frames of the ego-centric video to compute the path of the fingertip comprises performing a localization on a frame-by-frame basis; localizing, by the processor, the region of interest based on the path of the fingertip in the dynamic hand gesture, wherein the region of interest is larger than the frame and the path is traced using a combination of the dynamic hand gesture and a movement of the head-mounted video device, wherein an object in the region of interest is stitched together to fit in the region of interest in one frame; and performing, by the processor, an action based on the object in the region of interest. 2. The method of claim 1 , wherein the capturing comprises receiving a prompt to initiate the capturing of the ego-centric video, wherein the prompt comprises at least one of: an audio command, a tap or a swipe gesture. 3. The method of claim 1 , wherein the analyzing the frame of the ego-centric video to detect pixels that correspond to the fingertip comprises: detecting, by the processor, the pixels that correspond to a hand region; generating, by the processor, a binary mask of pixels indicating hand pixels within the hand region; applying, by the processor, an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask; and identifying, by the processor, the fingertip as one or more of the pixels within the hand region that have a most extreme coordinate values along a dimension. 4. The method of claim 3 , wherein the dimension comprises at least one of a horizontal dimension, a vertical dimension, a diagonal dimension, a row or a column. 5. The method of claim 1 , wherein the analyzing temporally one or more frames of the ego-centric video to compute the path of the fingertip comprises performing a localization on a single frame and using a tracking algorithm to detect the fingertip in subsequent frames. 6. The method of claim 1 , wherein a fingertip mark is displayed on the fingertip on a display of the head-mounted video device. 7. The method of claim 1 , wherein a line is displayed that traces over the path of the fingertip in a display of the head-mounted video device. 8. The method of claim 1 , further comprising: receiving, by the processor, a signal indicating that the dynamic hand gesture is completed. 9. The method of claim 1 , further comprising: determining, by the processor, that the dynamic hand gesture is completed when the path of the fingertip returns to approximately a same location that the fingertip began. 10. The method of claim 1 , wherein the compensating further comprises: tracking, by the processor, salient features around a selected location of the frame; and estimating, by the processor, a distortion that took place between the frame and subsequent frames of the ego-centric video based on the salient features to compensate for the drift of the head-mounted video device capturing the ego-centric video. 11. The method of claim 10 , wherein the distortion is assumed to be translational motion. 12. The method of claim 1 , wherein the compensating further comprises: computing, by the processor, a motion vector field indicating pixel-wise displacements that occurred between frames; and estimating, by the processor, a distortion that took place between the frame and subsequent frames of the ego-centric video to compensate for the drift of the head-mounted video device capturing the ego-centric video based on the motion vector field. 13. The method of claim 1 , further comprising: displaying, by the processor, a shape around the region of interest in a display of the head-mounted video device; and receiving, by the processor, a confirmation input that the shape correctly surrounds the region of interest. 14. The method of claim 1 , wherein the object comprises a text and the performing the action comprises: recognizing, by the processor, the text using an optical character recognition program. 15. The method of claim 14 , further comprising: automatically populating, by the processor, one or more fields of a form using the text that is identified. 16. The method of claim 14 , further comprising: translating, by the processor, the text in a first language to a second language. 17. The method of claim 1 , wherein the object comprises a moving object and the performing the action comprises: tracking, by the processor, the moving object within the region of interest. 18. A non-transitory computer-readable medium storing a plurality of instructions, which when executed by a processor, cause the processor to perform operations for localizing a region of interest in an ego-centric video using a dynamic hand gesture, comprising: capturing the ego-centric video containing the dynamic hand gesture; compensating for a drift of a head-mounted video device during the capturing of the ego-centric video; analyzing a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm; analyzing temporally one or more frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture, wherein the analyzing temporally the one or more frames of the ego-centric video to compute the path of the fingertip comprises performing a localization on a frame-by-frame basis; localizing the region of interest based on the path of the fingertip in the dynamic hand gesture, wherein the region of interest is larger than the one or more frames and the path is traced using a combination of the dynamic hand gesture and a movement of the head-mounted video device, wherein an object in the region of interest is stitched together to fit in the region of interest in one frame; and performing an action based on an object in the region of interest. 19. A method for localizing a region of interest in an ego-centric video using a dynamic hand gesture, comprising: capturing, by a processor, the ego-centric video containing the dynamic hand gesture; compensating, by the processor, for a drift of a head-mounted video device during the capturing of the ego-centric video; detecting, by the processor, a hand in a frame of the ego-centric video using a binary mask generated from a hand segmentation algorithm; identifying, by the processor, pixels that correspond to a fingertip of the hand based on one or more of the pixels of the hand that have the most extreme coordinate values; analyzing, by the processor, temporally one or more subsequent frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture using a motion tracking algorithm until the path is completed on a frame-by-frame basis; tracing, by the processor, the path of the fingertip in a display of a head-mounted video device using a combination of the dynamic hand gesture and a movement of the he

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9354711B2 cover?
A method, non-transitory computer-readable medium, and apparatus for localizing a region of interest using a dynamic hand gesture are disclosed. For example, the method captures the ego-centric video containing the dynamic hand gesture, analyzes a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm, analyzes temporally one or more f…
Who is the assignee on this patent?
Xerox Corp
What technology area does this patent fall under?
Primary CPC classification G06F3/017. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 31 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).