Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Neural network for skeletons from input images

US11645506B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11645506-B2
Application number	US-202217822080-A
Country	US
Kind code	B2
Filing date	Aug 24, 2022
Priority date	Feb 24, 2019
Publication date	May 9, 2023
Grant date	May 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing system is provided. The computing system includes a processor configured to execute a convolutional neural network that has been trained, the convolutional neural network including a backbone network that is a concatenated pyramid network, a plurality of first head neural networks, and a plurality of second head neural networks. At the backbone network, the processor is configured to receive an input image as input and output feature maps extracted from the input image. The processor is configured to: process the feature maps using each of the first head neural networks to output corresponding keypoint heatmaps; process the feature maps using each of the second head neural networks to output corresponding part affinity field heatmaps; link the keypoints into one or more instances of virtual skeletons using the part affinity fields; and output the instances of the virtual skeletons.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computing system, comprising: a processor and associated memory, the processor being configured to execute one or more programs stored in the memory to: receive an input image as input and extract feature maps from the input image; process the feature maps to thereby output corresponding keypoint heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of keypoints; process the feature maps to thereby output corresponding part affinity field heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of part affinity fields; link the plurality of keypoints into one or more instances of virtual skeletons using the plurality of part affinity fields; and output the one or more instances of the virtual skeletons. 2. The system of claim 1 , wherein processing the feature maps to thereby output corresponding keypoint heatmaps and processing the feature maps to thereby output corresponding part affinity field heatmaps are executed in parallel. 3. The system of claim 1 , further comprising the processor being configured to: execute a skeleton grouping network including a plurality of fully-connected layers, the skeleton grouping network configured to, from the keypoint heatmaps and part affinity field heatmaps, group segments of the one or more instances of the virtual skeletons. 4. The system of claim 1 , wherein the input image includes overlapping bodies, the processor being further configured to receive, as input, a pair of skeletons and output a confidence score indicating a probability that each skeleton in the pair of skeletons belongs to a same body in the input image. 5. The system of claim 1 , further comprising the processor being configured to: process the feature maps to thereby output corresponding instance segmentation maps for each segment indicating the probability that each pixel in the input image belongs to a corresponding one of the plurality of segments; using the instance segmentation maps, determine instance segmentation for parts of at least one body; and output the one or more instances of the virtual skeletons with instance segmentation for parts of the at least one body. 6. The system of claim 1 , wherein: the processor is configured to receive the input image using a backbone network including a concatenated pyramid network; and the concatenated pyramid network includes a residual neural network including a plurality of intermediate layers that are configured as convolutional neural network layers, the plurality of intermediate layers connected on a downstream side to a concatenation layer and a plurality of convolutional layers, in this order. 7. The system of claim 1 , wherein the feature maps are processed to thereby output corresponding keypoint heatmaps using a fully convolutional neural network including a plurality of convolutional layers. 8. The system of claim 1 , wherein the input image is from real-time input received from a visible light camera, a depth camera, or an infrared camera, and the processor is configured so that the outputting of the one or more instances of virtual skeletons from the input image received in real time is output in real time. 9. The system of claim 1 , wherein the input image includes one or more of visible light image data, depth data, and active brightness data. 10. The system of claim 1 , wherein linking the keypoints is performed by a greedy algorithm by fitting keypoint locations and part affinity field locations to form each instance of the one or more instances of the virtual skeletons, and linking the keypoints is repeated to maximize a total fitting score for each instance of the one or more instances of the virtual skeletons. 11. The system of claim 1 , wherein the processor is further configured to execute a convolutional neural network that has been trained for a single stage, wherein the convolutional neural network has been trained using a training data set including human body part localization and association data and a keypoint dataset. 12. A computing method for use with a computing device including a processor, comprising: receiving an input image as input and extracting feature maps from the input image; processing the feature maps to thereby output corresponding keypoint heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of keypoints; processing the feature maps to thereby output corresponding part affinity field heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of part affinity fields; linking the plurality of keypoints into one or more instances of virtual skeletons using the plurality of part affinity fields; and outputting the one or more instances of the virtual skeletons. 13. The computing method of claim 12 , wherein processing the feature maps to thereby output corresponding keypoint heatmaps and processing the feature maps to thereby output corresponding part affinity field heatmaps are executed in parallel. 14. The computing method of claim 12 , further comprising: executing a skeleton grouping network including a plurality of fully-connected layers, the skeleton grouping network configured to, from the keypoint heatmaps and part affinity field heatmaps, group segments of the one or more instances of the virtual skeletons, wherein the input image includes overlapping bodies, the processor being configured to receive, as input, a pair of skeletons and output a confidence score indicating a probability that each skeleton in the pair of skeletons belongs to a same body in the input image. 15. The computing method of claim 12 , further comprising: processing the feature maps to thereby output corresponding instance segmentation maps for each segment indicating the probability that each pixel in the input image belongs to a corresponding one of the plurality of segments; using the instance segmentation maps, determining instance segmentation for parts of at least one body; and outputting the one or more instances of the virtual skeletons with instance segmentation for parts of the at least one body. 16. The computing method of claim 12 , wherein: the input image is received using a backbone network including a concatenated pyramid network; and the concatenated pyramid network includes a residual neural network including a plurality of intermediate layers that are configured as convolutional neural network layers, the plurality of intermediate layers connected on a downstream side to a concatenation layer and a plurality of convolutional layers, in this order. 17. The computing method of claim 12 , wherein the feature maps are processed to thereby output corresponding keypoint heatmaps using a fully convolutional neural network including a plurality of convolutional layers. 18. The computing method of claim 12 , wherein the input image is from real-time input received from a visible light camera, a depth camera, or an infrared camera, and the processor is configured so that the outputting of the one or more instances of virtual skeletons from the input image received in real time is output in real time. 19. The computing method of claim 12 , further comprising executing a convolutional neural network that has been trained for a single stage, wherein the convolutional neural network has been trained using a training data set including human body part localizat

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06T2207/10028
Range image; Depth image; 3D point clouds · CPC title
G06V40/23
Recognition of whole body movements, e.g. for sport training · CPC title
G06N3/08Primary
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 72141239

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11645506B2 cover?: A computing system is provided. The computing system includes a processor configured to execute a convolutional neural network that has been trained, the convolutional neural network including a backbone network that is a concatenated pyramid network, a plurality of first head neural networks, and a plurality of second head neural networks. At the backbone network, the processor is configured t…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).