Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06V40/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 31 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Action classification based on manipulated object movement

US11106949B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11106949-B2
Application number	US-201916362530-A
Country	US
Kind code	B2
Filing date	Mar 22, 2019
Priority date	Mar 22, 2019
Publication date	Aug 31, 2021
Grant date	Aug 31, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing device, including a processor configured to receive a first video including a plurality of frames. For each frame, the processor may determine that a target region of the frame includes a target object. The processor may determine a surrounding region within which the target region is located. The surrounding region may be smaller than the frame. The processor may identify one or more features located in the surrounding region. From the one or more features, the processor may generate one or more manipulated object identifiers. For each of a plurality of pairs of frames, the processor may determine a respective manipulated object movement between a first manipulated object identifier of the first frame and a second manipulated object identifier of the second frame. The processor may classify at least one action performed in the first video based on the plurality of manipulated object movements.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computing device comprising: a processor configured to: receive a first video including a plurality of frames; for each frame of the plurality of frames: determine that a target region of the frame includes a target object; determine a surrounding region within which the target region is located, wherein the surrounding region is smaller than the frame and the target region is smaller than the surrounding region; extract one or more features located in the surrounding region; and from the one or more features, generate one or more manipulated object identifiers; for each of a plurality of pairs of frames of the first video respectively including a first frame and a second frame, determine a respective manipulated object movement between a first manipulated object identifier of the first frame and a second manipulated object identifier of the second frame; and classify at least one action performed in the first video based on the manipulated object movements. 2. The computing device of claim 1 , wherein the target object is a hand. 3. The computing device of claim 2 , wherein the one or more manipulated object identifiers respectively identify one or more manipulated objects manipulated by the hand. 4. The computing device of claim 3 , wherein the processor is configured to classify the at least one action at least in part by inputting the manipulated object movements into a grasp classifier, wherein the grasp classifier is configured to output a grasp label indicating a grasp type with which the hand grasps the one or more manipulated objects. 5. The computing device of claim 4 , wherein the grasp classifier is a recurrent neural network. 6. The computing device of claim 2 , wherein the processor is configured to determine that the target region of the frame includes a hand at least in part by inputting the frame into a hand detector selected from the group consisting of a recurrent neural network, a three-dimensional convolutional neural network, and a temporal convolutional neural network. 7. The computing device of claim 1 , wherein the processor is further configured to: classify a plurality of actions performed in the first video; and segment the first video into a plurality of activity phases, wherein the plurality of activity phases are defined by one or more respective actions of the plurality of actions performed during that activity phase. 8. The computing device of claim 7 , wherein the processor is further configured to: generate a plurality of action labels respectively corresponding to the plurality of actions; and output a first video annotation including each action label of the plurality of action labels, wherein the action label of each action is matched to a respective activity phase in which that action is performed. 9. The computing device of claim 7 , wherein the processor is further configured to: receive a second video; classify a second video action performed in the second video; determine that the second video action matches an action of the plurality of actions identified in the first video; and output a second video annotation in response to the determination that the second video action matches the action. 10. The computing device of claim 9 , wherein the second video annotation includes a subsequent phase action label associated with a subsequent activity phase following a second video activity phase associated with the second video action. 11. The computing device of claim 1 , wherein the processor is configured to generate the one or more manipulated object identifiers at least in part by inputting the one or more features into a manipulated object classifier selected from the group consisting of a recurrent neural network, a three-dimensional convolutional neural network, and a temporal convolutional neural network. 12. The computing device of claim 1 , wherein each manipulated object movement is an optical flow. 13. A method for use with a computing device, the method comprising: receiving a first video including a plurality of frames; for each frame of the plurality of frames: determining that a target region of the frame includes a target object; determining a surrounding region within which the target region is located, wherein the surrounding region is smaller than the frame and the target region is smaller than the surrounding region; extracting one or more features located in the surrounding region; and from the one or more features, generating one or more manipulated object identifiers; for each of a plurality of pairs of frames of the first video respectively including a first frame and a second frame, determining a respective manipulated object movement between a first manipulated object identifier of the first frame and a second manipulated object identifier of the second frame; and classifying at least one action performed in the first video based on the manipulated object movements. 14. The method of claim 13 , wherein the target object is a hand. 15. The method of claim 14 , wherein the one or more manipulated object identifiers respectively identify one or more manipulated objects manipulated by the hand. 16. The method of claim 15 , wherein classifying the at least one action includes inputting the manipulated object movements into a grasp classifier, wherein the grasp classifier is configured to output a grasp label indicating a grasp type with which the hand grasps the one or more manipulated objects. 17. The method of claim 13 , further comprising: classifying a plurality of actions performed in the first video; and segmenting the first video into a plurality of activity phases, wherein the plurality of activity phases are defined by one or more respective actions of the plurality of actions performed during that activity phase. 18. The method of claim 17 , further comprising: generating a plurality of action labels respectively corresponding to the plurality of actions; and outputting a first video annotation including each action label of the plurality of action labels, wherein the action label of each action is matched to a respective activity phase in which that action is performed. 19. The method of claim 17 , further comprising: receiving a second video; classifying a second video action performed in the second video; determining that the second video action matches an action of the plurality of actions identified in the first video; and outputting a second video annotation in response to the determination that the second video action matches the action. 20. A computing device comprising: a processor configured to: receive a first video including a plurality of frames; for each frame of the plurality of frames: determine that a first target region of the frame includes a first hand and a second target region of the frame includes a second hand; determine a first surrounding region within which the first target region is located and a second surrounding region within which the second target region is located, wherein the first surrounding region and the second surrounding region are each smaller than the frame; identify one or more first surrounding region features located in the first surrounding region; identify one or more second surrounding region features located in the second surrounding region; and from the one or more first surrounding region features and/or the one or more second surrounding region features, generate one or more manipulated object identifiers th

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06V40/20Primary
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
G06F18/2411
based on the proximity to a decision surface, e.g. support vector machines · CPC title
G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 69771247

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11106949B2 cover?: A computing device, including a processor configured to receive a first video including a plurality of frames. For each frame, the processor may determine that a target region of the frame includes a target object. The processor may determine a surrounding region within which the target region is located. The surrounding region may be smaller than the frame. The processor may identify one or mo…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06V40/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 31 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods for real-time skill assessment of multi-step tasks performed by hand movements using a video camera

Information processing apparatus, display apparatus, information processing method, and program

Information processing apparatus recognizing certain object in captured image, and method for controlling the same

Image processing apparatus, image processing method, and program

Method and apparatus for tracking object

Frequently asked questions