Methods and systems of display edge interactions in a gesture-controlled device
US-11693483-B2 · Jul 4, 2023 · US
US12282633B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12282633-B2 |
| Application number | US-202318242040-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 5, 2023 |
| Priority date | Sep 5, 2022 |
| Publication date | Apr 22, 2025 |
| Grant date | Apr 22, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed is a method and system for predicting a touch interaction position on a large display based on a binocular camera. The method includes: separately acquiring arm movement video frames of a user and facial and eye movement video frames of the user by a binocular camera; extracting a video clip of each tapping action from the arm movement video frames and the facial and eye movement video frames and obtaining a key frame by screening; marking the key frame of each tapping action with coordinates to indicate coordinates of a finger in a display screen; inputting the marked key frame to an efficient convolutional network for online video understanding (ECO)-Lite neural network for training to obtain a predictive network model; and inputting a video frame of a current operation to be predicted to the predictive network model and outputting a touch interaction position predicted for the current operation.
Opening claim text (preview).
What is claimed is: 1. A method for predicting a touch interaction position on a large display based on a binocular camera, comprising the following steps: S 1 , separately acquiring arm movement video frames of a user and facial and eye movement video frames of the user by a binocular camera; S 2 , extracting a video clip of each tapping action from the arm movement video frames and the facial and eye movement video frames and obtaining a key frame by screening; S 3 , marking the key frame of each tapping action with coordinates to indicate coordinates of a finger in a display screen; S 4 , inputting the marked key frame to an efficient convolutional network for online video understanding (ECO)-Lite neural network for training to obtain a predictive network model; and S 5 , inputting a video frame of a current operation to be predicted to the predictive network model and outputting a touch interaction position predicted for the current operation. 2. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 1 , wherein in step S 1 , a camera is disposed right above a middle of a display and configured to acquire the facial and eye movement video frames of the user; and a network camera is disposed on a side of the display to acquire the arm movement video frames of the user. 3. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 1 , wherein in step S 2 , when extracting the key frame of each tapping action, 1000 ms before completion of each tapping event is split as a tapping action, and video clips of a plurality of tapping actions are obtainable by splitting; and for each video clip, an image frame with no movement is removed from 1000 ms video frames, and the key frame of each tapping action is obtained by extraction from remaining video frames at an interval of 50 ms. 4. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 3 , wherein a condition for determining the image frame with no movement is as follows: redundant information of adjacent image frames is greater than 90%. 5. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 3 , wherein step S 4 comprises the following steps: S 41 , taking a key frame extracted from the arm movement video frames and a key frame extracted from the facial and eye movement video frames as model inputs; S 42 , performing convolutional processing using a convolution pool part, extracting two-dimensional (2D) image features by a 2D network, and arranging the extracted 2D image features in an order of video frames; S 43 , taking the arranged 2D image features and an arrangement relationship as inputs to a three-dimensional (3D) convolution for end-to-end fusion to acquire movement features; and S 44 , merging a movement motion feature and facial and eye movement features after the 3D convolution, followed by inputting to a fully connected layer for result prediction and comparison with the marked coordinates, and calculating a loss value for parameter adjustment to obtain the predictive network model. 6. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 5 , wherein in step S 42 , the 2D network is batch normalization (BN)-Inception. 7. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 5 , wherein in step S 43 , the 3D convolution is 3D-Resnet18.
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
using neural networks · CPC title
Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.