What technology area does this patent fall under?

Primary CPC classification G06V40/23. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Skeleton-based action recognition using bi-directional spatial-temporal transformer

US11854305B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11854305-B2
Application number	US-202117315319-A
Country	US
Kind code	B2
Filing date	May 9, 2021
Priority date	May 9, 2021
Publication date	Dec 26, 2023
Grant date	Dec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: instantiating a bi-directional spatial-temporal transformer neural network; and training the bi-directional spatial-temporal transformer neural network to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames by: obtaining a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints; producing a spatially masked frame from the specific frame by masking the original coordinates of the skeletal joint; providing the specific frame, the spatially masked frame, and at least one more of the plurality of frames to a coordinate prediction head of the bi-directional spatial-temporal transformer network; obtaining, from the coordinate prediction head, a prediction of coordinates for the skeletal joint in the spatially masked frame; and adjusting parameters of the bi-directional spatial-temporal transformer neural network until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges. 2. The method of claim 1 , further comprising: training the bi-directional spatial-temporal transformer neural network to predict a correct time order of sequential coordinates of the skeletal joint by: producing a plurality of time-shuffled frames by time-shuffling the plurality of frames; providing the plurality of time-shuffled frames to a temporal classification head along with the plurality of frames; obtaining from the temporal classification head a prediction of correct time order for the plurality of time-shuffled frames; and adjusting parameters of the bi-directional spatial-temporal transformer neural network until a cross-entropy loss, between the prediction of correct time order and the plurality of frames, converges. 3. The method of claim 2 , further comprising: detecting a skeletal joint motion sequence by applying the trained bi-directional spatial-temporal transformer neural network to a sequence of frames; and transmitting a control signal to at least one of an electromechanical device, an electrooptical device, and an electronic device, in response to the detected skeletal joint motion sequence. 4. The method of claim 1 , further comprising: training the bi-directional spatial-temporal transformer neural network to predict a correct spatial arrangement of coordinates of a plurality of skeletal joints by: producing a plurality of space-shuffled frames by spatially rearranging the plurality of joints in one or more of the frames; providing the plurality of space-shuffled frames to a spatial classification head along with the plurality of frames; obtaining from the spatial classification head a prediction of correct spatial arrangement for the plurality of joints in the plurality of space-shuffled frames; and adjusting parameters of the bi-directional spatial-temporal transformer neural network until a cross-entropy loss, between the prediction of correct spatial arrangement and the plurality of frames, converges. 5. The method of claim 4 , further comprising: detecting a skeletal joint motion sequence by applying the trained bi-directional spatial-temporal transformer neural network to a sequence of frames; and transmitting a control signal to at least one of an electromechanical device, an electrooptical device, and an electronic device, in response to the detected skeletal joint motion sequence. 6. The method of claim 1 , further comprising: training the bi-directional spatial-temporal transformer neural network to predict a correct semantic coding of a plurality of skeletal joints by: producing a semantically masked frame from the specific frame by masking at least a part of a matrix of one-hot vectors corresponding to the plurality of joints in the specific frame; providing the semantically masked frame and the specific frame to a semantic prediction head of the bi-directional spatial-temporal transformer network; obtaining from the semantic prediction head a predicted matrix of one-hot vectors for the semantically masked frame; and adjusting parameters of the bi-directional spatial-temporal transformer network until a cross-entropy classification loss, between the predicted matrix of one-hot vectors and the matrix of one-hot vectors corresponding to the plurality of joints, converges. 7. The method of claim 6 , further comprising: detecting a skeletal joint motion sequence by applying the trained bi-directional spatial-temporal transformer neural network to a sequence of frames; and transmitting a control signal to at least one of an electromechanical device, an electrooptical device, and an electronic device, in response to the detected skeletal joint motion sequence. 8. A computer program product comprising one or more non-transitory computer readable storage media that embody computer executable instructions, which when executed by a computer cause the computer to perform a method comprising: instantiating a bi-directional spatial-temporal transformer neural network; and training the bi-directional spatial-temporal transformer neural network to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames by: obtaining a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints; producing a spatially masked frame from the specific frame by masking the original coordinates of the skeletal joint; providing the specific frame, the spatially masked frame, and at least one more of the plurality of frames to a coordinate prediction head of the bi-directional spatial-temporal transformer network; obtaining from the coordinate prediction head a prediction of coordinates for the skeletal joint in the spatially masked frame; and adjusting parameters of the bi-directional spatial-temporal transformer neural network until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges. 9. The computer program product of claim 8 , wherein the method further comprises: training the bi-directional spatial-temporal transformer neural network to predict a correct time order of sequential coordinates of the skeletal joint by: producing a plurality of time-shuffled frames by time-shuffling the plurality of frames; providing the plurality of time-shuffled frames to a temporal classification head along with the plurality of frames; obtaining from the temporal classification head a prediction of correct time order for the plurality of time-shuffled frames; and adjusting parameters of the bi-directional spatial-temporal transformer neural network until a cross-entropy loss, between the prediction of correct time order and the plurality of frames, converges. 10. The computer program product of claim 9 , wherein the method further comprises: detecting a skeletal joint motion sequence by applying the trained bi-directional spatial-temporal transformer neural network to a sequence of frames; and transmitting a control signal to at least one of an electromechanical device, an electrooptical device, and an electronic device, in response to the detected skeletal joint motion sequence. 11. The computer program product of claim 8 , wherein the method further comprises: training the bi-directional spatial-temporal transformer neural network to predict a correct spatial arrangement of coordinates of

Assignees

Inventors

Classifications

G06V40/23Primary
Recognition of whole body movements, e.g. for sport training · CPC title
G06F3/011
Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title
G06F18/2133
based on naturality criteria, e.g. with non-negative factorisation or negative correlation · CPC title
G06T7/246
using feature-based methods, e.g. the tracking of corners or segments · CPC title
G06V20/46
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

Patent family

Related publications grouped by family.

View patent family 84029401

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854305B2 cover?: A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spa…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06V40/23. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Action classification using deep embedded clustering

Image coding method, action recognition method, and action recognition apparatus

SELECTIVE DIMMING OF AMBIENT LIGHTING IN VIRTUAL, AUGMENTED, AND MIXED REALITY (xR) APPLICATIONS

Systems and methods for prototyping a virtual model

Directional impression analysis using deep learning

Methods and systems for creating virtual and augmented reality

Apparatus and method for poomsae recognition and dan promotion test of taekwondo based on skeleton of human body using depth camera

Frequently asked questions