Neural network for skeletons from input images

US11645506B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11645506-B2
Application numberUS-202217822080-A
CountryUS
Kind codeB2
Filing dateAug 24, 2022
Priority dateFeb 24, 2019
Publication dateMay 9, 2023
Grant dateMay 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing system is provided. The computing system includes a processor configured to execute a convolutional neural network that has been trained, the convolutional neural network including a backbone network that is a concatenated pyramid network, a plurality of first head neural networks, and a plurality of second head neural networks. At the backbone network, the processor is configured to receive an input image as input and output feature maps extracted from the input image. The processor is configured to: process the feature maps using each of the first head neural networks to output corresponding keypoint heatmaps; process the feature maps using each of the second head neural networks to output corresponding part affinity field heatmaps; link the keypoints into one or more instances of virtual skeletons using the part affinity fields; and output the instances of the virtual skeletons.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computing system, comprising: a processor and associated memory, the processor being configured to execute one or more programs stored in the memory to: receive an input image as input and extract feature maps from the input image; process the feature maps to thereby output corresponding keypoint heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of keypoints; process the feature maps to thereby output corresponding part affinity field heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of part affinity fields; link the plurality of keypoints into one or more instances of virtual skeletons using the plurality of part affinity fields; and output the one or more instances of the virtual skeletons. 2. The system of claim 1 , wherein processing the feature maps to thereby output corresponding keypoint heatmaps and processing the feature maps to thereby output corresponding part affinity field heatmaps are executed in parallel. 3. The system of claim 1 , further comprising the processor being configured to: execute a skeleton grouping network including a plurality of fully-connected layers, the skeleton grouping network configured to, from the keypoint heatmaps and part affinity field heatmaps, group segments of the one or more instances of the virtual skeletons. 4. The system of claim 1 , wherein the input image includes overlapping bodies, the processor being further configured to receive, as input, a pair of skeletons and output a confidence score indicating a probability that each skeleton in the pair of skeletons belongs to a same body in the input image. 5. The system of claim 1 , further comprising the processor being configured to: process the feature maps to thereby output corresponding instance segmentation maps for each segment indicating the probability that each pixel in the input image belongs to a corresponding one of the plurality of segments; using the instance segmentation maps, determine instance segmentation for parts of at least one body; and output the one or more instances of the virtual skeletons with instance segmentation for parts of the at least one body. 6. The system of claim 1 , wherein: the processor is configured to receive the input image using a backbone network including a concatenated pyramid network; and the concatenated pyramid network includes a residual neural network including a plurality of intermediate layers that are configured as convolutional neural network layers, the plurality of intermediate layers connected on a downstream side to a concatenation layer and a plurality of convolutional layers, in this order. 7. The system of claim 1 , wherein the feature maps are processed to thereby output corresponding keypoint heatmaps using a fully convolutional neural network including a plurality of convolutional layers. 8. The system of claim 1 , wherein the input image is from real-time input received from a visible light camera, a depth camera, or an infrared camera, and the processor is configured so that the outputting of the one or more instances of virtual skeletons from the input image received in real time is output in real time. 9. The system of claim 1 , wherein the input image includes one or more of visible light image data, depth data, and active brightness data. 10. The system of claim 1 , wherein linking the keypoints is performed by a greedy algorithm by fitting keypoint locations and part affinity field locations to form each instance of the one or more instances of the virtual skeletons, and linking the keypoints is repeated to maximize a total fitting score for each instance of the one or more instances of the virtual skeletons. 11. The system of claim 1 , wherein the processor is further configured to execute a convolutional neural network that has been trained for a single stage, wherein the convolutional neural network has been trained using a training data set including human body part localization and association data and a keypoint dataset. 12. A computing method for use with a computing device including a processor, comprising: receiving an input image as input and extracting feature maps from the input image; processing the feature maps to thereby output corresponding keypoint heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of keypoints; processing the feature maps to thereby output corresponding part affinity field heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of part affinity fields; linking the plurality of keypoints into one or more instances of virtual skeletons using the plurality of part affinity fields; and outputting the one or more instances of the virtual skeletons. 13. The computing method of claim 12 , wherein processing the feature maps to thereby output corresponding keypoint heatmaps and processing the feature maps to thereby output corresponding part affinity field heatmaps are executed in parallel. 14. The computing method of claim 12 , further comprising: executing a skeleton grouping network including a plurality of fully-connected layers, the skeleton grouping network configured to, from the keypoint heatmaps and part affinity field heatmaps, group segments of the one or more instances of the virtual skeletons, wherein the input image includes overlapping bodies, the processor being configured to receive, as input, a pair of skeletons and output a confidence score indicating a probability that each skeleton in the pair of skeletons belongs to a same body in the input image. 15. The computing method of claim 12 , further comprising: processing the feature maps to thereby output corresponding instance segmentation maps for each segment indicating the probability that each pixel in the input image belongs to a corresponding one of the plurality of segments; using the instance segmentation maps, determining instance segmentation for parts of at least one body; and outputting the one or more instances of the virtual skeletons with instance segmentation for parts of the at least one body. 16. The computing method of claim 12 , wherein: the input image is received using a backbone network including a concatenated pyramid network; and the concatenated pyramid network includes a residual neural network including a plurality of intermediate layers that are configured as convolutional neural network layers, the plurality of intermediate layers connected on a downstream side to a concatenation layer and a plurality of convolutional layers, in this order. 17. The computing method of claim 12 , wherein the feature maps are processed to thereby output corresponding keypoint heatmaps using a fully convolutional neural network including a plurality of convolutional layers. 18. The computing method of claim 12 , wherein the input image is from real-time input received from a visible light camera, a depth camera, or an infrared camera, and the processor is configured so that the outputting of the one or more instances of virtual skeletons from the input image received in real time is output in real time. 19. The computing method of claim 12 , further comprising executing a convolutional neural network that has been trained for a single stage, wherein the convolutional neural network has been trained using a training data set including human body part localizat

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Range image; Depth image; 3D point clouds · CPC title

  • Recognition of whole body movements, e.g. for sport training · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11645506B2 cover?
A computing system is provided. The computing system includes a processor configured to execute a convolutional neural network that has been trained, the convolutional neural network including a backbone network that is a concatenated pyramid network, a plurality of first head neural networks, and a plurality of second head neural networks. At the backbone network, the processor is configured t…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).