Method and system for predicting touch interaction position on large display based on binocular camera

US12282633B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12282633-B2
Application numberUS-202318242040-A
CountryUS
Kind codeB2
Filing dateSep 5, 2023
Priority dateSep 5, 2022
Publication dateApr 22, 2025
Grant dateApr 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a method and system for predicting a touch interaction position on a large display based on a binocular camera. The method includes: separately acquiring arm movement video frames of a user and facial and eye movement video frames of the user by a binocular camera; extracting a video clip of each tapping action from the arm movement video frames and the facial and eye movement video frames and obtaining a key frame by screening; marking the key frame of each tapping action with coordinates to indicate coordinates of a finger in a display screen; inputting the marked key frame to an efficient convolutional network for online video understanding (ECO)-Lite neural network for training to obtain a predictive network model; and inputting a video frame of a current operation to be predicted to the predictive network model and outputting a touch interaction position predicted for the current operation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for predicting a touch interaction position on a large display based on a binocular camera, comprising the following steps: S 1 , separately acquiring arm movement video frames of a user and facial and eye movement video frames of the user by a binocular camera; S 2 , extracting a video clip of each tapping action from the arm movement video frames and the facial and eye movement video frames and obtaining a key frame by screening; S 3 , marking the key frame of each tapping action with coordinates to indicate coordinates of a finger in a display screen; S 4 , inputting the marked key frame to an efficient convolutional network for online video understanding (ECO)-Lite neural network for training to obtain a predictive network model; and S 5 , inputting a video frame of a current operation to be predicted to the predictive network model and outputting a touch interaction position predicted for the current operation. 2. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 1 , wherein in step S 1 , a camera is disposed right above a middle of a display and configured to acquire the facial and eye movement video frames of the user; and a network camera is disposed on a side of the display to acquire the arm movement video frames of the user. 3. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 1 , wherein in step S 2 , when extracting the key frame of each tapping action, 1000 ms before completion of each tapping event is split as a tapping action, and video clips of a plurality of tapping actions are obtainable by splitting; and for each video clip, an image frame with no movement is removed from 1000 ms video frames, and the key frame of each tapping action is obtained by extraction from remaining video frames at an interval of 50 ms. 4. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 3 , wherein a condition for determining the image frame with no movement is as follows: redundant information of adjacent image frames is greater than 90%. 5. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 3 , wherein step S 4 comprises the following steps: S 41 , taking a key frame extracted from the arm movement video frames and a key frame extracted from the facial and eye movement video frames as model inputs; S 42 , performing convolutional processing using a convolution pool part, extracting two-dimensional (2D) image features by a 2D network, and arranging the extracted 2D image features in an order of video frames; S 43 , taking the arranged 2D image features and an arrangement relationship as inputs to a three-dimensional (3D) convolution for end-to-end fusion to acquire movement features; and S 44 , merging a movement motion feature and facial and eye movement features after the 3D convolution, followed by inputting to a fully connected layer for result prediction and comparison with the marked coordinates, and calculating a loss value for parameter adjustment to obtain the predictive network model. 6. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 5 , wherein in step S 42 , the 2D network is batch normalization (BN)-Inception. 7. The method for predicting a touch interaction position on a large display based on a binocular camera according to claim 5 , wherein in step S 43 , the 3D convolution is 3D-Resnet18.

Assignees

Inventors

Classifications

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • using neural networks · CPC title

  • Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title

  • Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12282633B2 cover?
Disclosed is a method and system for predicting a touch interaction position on a large display based on a binocular camera. The method includes: separately acquiring arm movement video frames of a user and facial and eye movement video frames of the user by a binocular camera; extracting a video clip of each tapping action from the arm movement video frames and the facial and eye movement vide…
Who is the assignee on this patent?
Univ Hangzhou Dianzi
What technology area does this patent fall under?
Primary CPC classification G06F3/0425. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).