Device control method, apparatus, and system
US-2023108331-A1 · Apr 6, 2023 · US
US12093465B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12093465-B2 |
| Application number | US-202217950246-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 22, 2022 |
| Priority date | Mar 23, 2020 |
| Publication date | Sep 17, 2024 |
| Grant date | Sep 17, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for gesture-based control of a device are described. An input frame is processed to determine a location of a distinguishing anatomical feature in the input frame. A virtual gesture-space is defined based on the location of the distinguishing anatomical feature, the virtual gesture-space being a defined space for detecting a gesture input. The input frame is processed in only the virtual gesture-space, to detect and track a hand. Using information generated from detecting and tracking the at least one hand, a gesture class is determined for the at least one hand. The device may be a smart television, a smart phone, a tablet, etc.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: processing an input frame of a sequence of frames captured by a camera of a device to determine a location of at least one detected instance of a distinguishing anatomical feature in the input frame, the at least one detected instance of the distinguishing anatomical feature detected in the input frame being a non-hand anatomical feature; defining, for at least a selected one of the at least one detected instance of the distinguishing anatomical feature, a virtual gesture-space based on the location of the selected one instance of the distinguishing anatomical feature, the virtual gesture-space being a shape defined within the input frame for detecting a gesture input; processing only the virtual gesture-space that is the shape defined within each frame in the sequence of frames to detect and track at least one hand; predicting, using information generated from detecting and tracking the at least one hand, a gesture class associated with the at least one hand; and outputting the predicted gesture class associated with the at least one hand. 2. The method of claim 1 , wherein the distinguishing anatomical feature is a human face. 3. The method of claim 1 , wherein there is a plurality of detected instances of the distinguishing anatomical feature, one virtual gesture-space is defined for each respective detected instance, and each virtual gesture-space is processed to perform hand detection and tracking. 4. The method of claim 1 , further comprising: after the virtual gesture-space has been defined, processing at least one subsequent input frame by performing hand detection and tracking in only the defined virtual gesture-space without further performing detection of the distinguishing anatomical feature in the at least one subsequent input frame. 5. The method of claim 1 , further comprising: using information generated from detecting and tracking the at least one hand, redefining the virtual gesture-space based on a detected location of the at least one hand. 6. The method of claim 5 , further comprising: after the virtual gesture-space has been redefined based on the detected location of the at least one hand, processing at least one subsequent input frame by performing hand detection and tracking only in the redefined virtual gesture-space without further performing detection of the distinguishing anatomical feature in the at least one subsequent input frame. 7. The method of claim 1 , wherein the information generated from detecting and tracking the at least one hand includes a bounding box defining the at least one hand in the input frame, and wherein gesture classification is performed using the bounding box. 8. The method of claim 1 , further comprising: defining one or more subspaces in the virtual gesture-space; wherein information generated from detecting and tracking the at least one hand includes information indicating the at least one hand is detected in one of the one or more subspaces; and wherein each subspace is associated with a respective mouse input. 9. The method of claim 1 , wherein the virtual gesture-space is a 3D shape. 10. An apparatus comprising: a processing device coupled to a memory storing machine-executable instructions thereon, wherein the instructions, when executed by the processing device, cause the apparatus to: process an input frame of a sequence of frames to determine a location of at least one detected instance of a distinguishing anatomical feature in the input frame, the at least one detected instance of the distinguishing anatomical feature detected in the input frame being a non-hand anatomical feature; define, for at least a selected one of the at least one detected instance of the distinguishing anatomical feature, a virtual gesture-space based on the location of the selected one instance of the distinguishing anatomical feature, the virtual gesture-space being a shape defined within the input frame for detecting a gesture input; process only the virtual gesture-space that is the shape defined within each frame in the sequence of frames to detect and track at least one hand; predict, using information generated from detecting and tracking the at least one hand, a gesture class associated with the at least one hand; and output the predicted gesture class associated with the at least one hand. 11. The apparatus of claim 10 , wherein the distinguishing anatomical feature is a human face. 12. The apparatus of claim 10 , wherein there is a plurality of detected instances of the distinguishing anatomical feature, one virtual gesture-space is defined for each respective detected instance, and each virtual gesture-space is processed to perform hand detection and tracking. 13. The apparatus of claim 10 , wherein the instructions further cause the apparatus to: after the virtual gesture-space has been defined, process at least one subsequent input frame by performing hand detection and tracking only in the defined virtual gesture-space without further performing detection of the distinguishing anatomical feature in the at least one subsequent input frame. 14. The apparatus of claim 10 , wherein the instructions further cause the apparatus to: using information generated from detecting and tracking the at least one hand, redefine the virtual gesture-space based on a detected location of the at least one hand. 15. The apparatus of claim 14 , wherein the instructions further cause the apparatus to: after the virtual gesture-space has been redefined based on the detected location of the at least one hand, process at least one subsequent input frame by performing hand detection and tracking only in the redefined virtual gesture-space without further performing detection of the distinguishing anatomical feature in the at least one subsequent input frame. 16. The apparatus of claim 10 , wherein the information generated from detecting and tracking the at least one hand includes a bounding box defining the at least one hand in the input frame, and wherein gesture classification is performed using the bounding box. 17. The apparatus of claim 10 , wherein the instructions further cause the apparatus to: define one or more subspaces in the virtual gesture-space; wherein information generated from detecting and tracking the at least one hand includes information indicating the at least one hand is detected in one of the one or more subspaces; and wherein each subspace is associated with a respective mouse input. 18. The apparatus of claim 10 , wherein the apparatus is a gesture-controlled device, and wherein the determined gesture class is used to determine a command input to the gesture-controlled device. 19. The apparatus of claim 18 , further comprising a camera for capturing the sequence of frames including the input frame, and the gesture-controlled device is one of: a television, a smartphone, a tablet, a vehicle-coupled device, an internet of things device, an artificial reality device, or a virtual reality device. 20. A non-transitory computer-readable medium having machine-executable instructions stored thereon, the instructions, when executed by a processing device of an apparatus, cause the apparatus to: process an input frame of a sequence of frames to determine a location of at least one detected instance of a distinguishing anatomical feature in the input frame, the at least one detected instance of the distinguishing anatomical feature detected in the input frame being a non-hand anatomical feature;
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Combinations of networks · CPC title
Activation functions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.