Interaction detection model for identifying human-object interactions in image content
US-11106902-B2 · Aug 31, 2021 · US
US11645506B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11645506-B2 |
| Application number | US-202217822080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 24, 2022 |
| Priority date | Feb 24, 2019 |
| Publication date | May 9, 2023 |
| Grant date | May 9, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computing system is provided. The computing system includes a processor configured to execute a convolutional neural network that has been trained, the convolutional neural network including a backbone network that is a concatenated pyramid network, a plurality of first head neural networks, and a plurality of second head neural networks. At the backbone network, the processor is configured to receive an input image as input and output feature maps extracted from the input image. The processor is configured to: process the feature maps using each of the first head neural networks to output corresponding keypoint heatmaps; process the feature maps using each of the second head neural networks to output corresponding part affinity field heatmaps; link the keypoints into one or more instances of virtual skeletons using the part affinity fields; and output the instances of the virtual skeletons.
Opening claim text (preview).
The invention claimed is: 1. A computing system, comprising: a processor and associated memory, the processor being configured to execute one or more programs stored in the memory to: receive an input image as input and extract feature maps from the input image; process the feature maps to thereby output corresponding keypoint heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of keypoints; process the feature maps to thereby output corresponding part affinity field heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of part affinity fields; link the plurality of keypoints into one or more instances of virtual skeletons using the plurality of part affinity fields; and output the one or more instances of the virtual skeletons. 2. The system of claim 1 , wherein processing the feature maps to thereby output corresponding keypoint heatmaps and processing the feature maps to thereby output corresponding part affinity field heatmaps are executed in parallel. 3. The system of claim 1 , further comprising the processor being configured to: execute a skeleton grouping network including a plurality of fully-connected layers, the skeleton grouping network configured to, from the keypoint heatmaps and part affinity field heatmaps, group segments of the one or more instances of the virtual skeletons. 4. The system of claim 1 , wherein the input image includes overlapping bodies, the processor being further configured to receive, as input, a pair of skeletons and output a confidence score indicating a probability that each skeleton in the pair of skeletons belongs to a same body in the input image. 5. The system of claim 1 , further comprising the processor being configured to: process the feature maps to thereby output corresponding instance segmentation maps for each segment indicating the probability that each pixel in the input image belongs to a corresponding one of the plurality of segments; using the instance segmentation maps, determine instance segmentation for parts of at least one body; and output the one or more instances of the virtual skeletons with instance segmentation for parts of the at least one body. 6. The system of claim 1 , wherein: the processor is configured to receive the input image using a backbone network including a concatenated pyramid network; and the concatenated pyramid network includes a residual neural network including a plurality of intermediate layers that are configured as convolutional neural network layers, the plurality of intermediate layers connected on a downstream side to a concatenation layer and a plurality of convolutional layers, in this order. 7. The system of claim 1 , wherein the feature maps are processed to thereby output corresponding keypoint heatmaps using a fully convolutional neural network including a plurality of convolutional layers. 8. The system of claim 1 , wherein the input image is from real-time input received from a visible light camera, a depth camera, or an infrared camera, and the processor is configured so that the outputting of the one or more instances of virtual skeletons from the input image received in real time is output in real time. 9. The system of claim 1 , wherein the input image includes one or more of visible light image data, depth data, and active brightness data. 10. The system of claim 1 , wherein linking the keypoints is performed by a greedy algorithm by fitting keypoint locations and part affinity field locations to form each instance of the one or more instances of the virtual skeletons, and linking the keypoints is repeated to maximize a total fitting score for each instance of the one or more instances of the virtual skeletons. 11. The system of claim 1 , wherein the processor is further configured to execute a convolutional neural network that has been trained for a single stage, wherein the convolutional neural network has been trained using a training data set including human body part localization and association data and a keypoint dataset. 12. A computing method for use with a computing device including a processor, comprising: receiving an input image as input and extracting feature maps from the input image; processing the feature maps to thereby output corresponding keypoint heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of keypoints; processing the feature maps to thereby output corresponding part affinity field heatmaps indicating a probability that each pixel in the input image belongs to a corresponding one of a plurality of part affinity fields; linking the plurality of keypoints into one or more instances of virtual skeletons using the plurality of part affinity fields; and outputting the one or more instances of the virtual skeletons. 13. The computing method of claim 12 , wherein processing the feature maps to thereby output corresponding keypoint heatmaps and processing the feature maps to thereby output corresponding part affinity field heatmaps are executed in parallel. 14. The computing method of claim 12 , further comprising: executing a skeleton grouping network including a plurality of fully-connected layers, the skeleton grouping network configured to, from the keypoint heatmaps and part affinity field heatmaps, group segments of the one or more instances of the virtual skeletons, wherein the input image includes overlapping bodies, the processor being configured to receive, as input, a pair of skeletons and output a confidence score indicating a probability that each skeleton in the pair of skeletons belongs to a same body in the input image. 15. The computing method of claim 12 , further comprising: processing the feature maps to thereby output corresponding instance segmentation maps for each segment indicating the probability that each pixel in the input image belongs to a corresponding one of the plurality of segments; using the instance segmentation maps, determining instance segmentation for parts of at least one body; and outputting the one or more instances of the virtual skeletons with instance segmentation for parts of the at least one body. 16. The computing method of claim 12 , wherein: the input image is received using a backbone network including a concatenated pyramid network; and the concatenated pyramid network includes a residual neural network including a plurality of intermediate layers that are configured as convolutional neural network layers, the plurality of intermediate layers connected on a downstream side to a concatenation layer and a plurality of convolutional layers, in this order. 17. The computing method of claim 12 , wherein the feature maps are processed to thereby output corresponding keypoint heatmaps using a fully convolutional neural network including a plurality of convolutional layers. 18. The computing method of claim 12 , wherein the input image is from real-time input received from a visible light camera, a depth camera, or an infrared camera, and the processor is configured so that the outputting of the one or more instances of virtual skeletons from the input image received in real time is output in real time. 19. The computing method of claim 12 , further comprising executing a convolutional neural network that has been trained for a single stage, wherein the convolutional neural network has been trained using a training data set including human body part localizat
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Range image; Depth image; 3D point clouds · CPC title
Recognition of whole body movements, e.g. for sport training · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.