Occlusion-aware prediction of human behavior
US-12094252-B2 · Sep 17, 2024 · US
US11915523B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11915523-B2 |
| Application number | US-202217815361-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 27, 2022 |
| Priority date | Dec 9, 2019 |
| Publication date | Feb 27, 2024 |
| Grant date | Feb 27, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes receiving, from a camera disposed on a robotic device, a two-dimensional (2D) image of a body of an actor and determining, for each respective keypoint of a first subset of a plurality of keypoints, 2D coordinates of the respective keypoint within the 2D image. The plurality of keypoints represent body locations. Each respective keypoint of the first subset is visible in the 2D image. The method also includes determining a second subset of the plurality of keypoints. Each respective keypoint of the second subset is not visible in the 2D image. The method further includes determining, by way of a machine learning model, an extent of engagement of the actor with the robotic device based on (i) the 2D coordinates of keypoints of the first subset and (ii) for each respective keypoint of the second subset, an indicator that the respective keypoint is not visible.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, from a camera disposed on a robotic device, an image representing at least part of a body of an actor; determining, for each respective keypoint of a first keypoint subset of a plurality of keypoints, coordinates of the respective keypoint within the second image, wherein the plurality of keypoints represent a corresponding plurality of predetermined body locations, and wherein each respective keypoint of the first keypoint subset is visible in the image; determining a second keypoint subset of the plurality of keypoints, wherein each respective keypoint of the second keypoint subset is not visible in the image; and determining, by way of a machine learning model, an extent of engagement of the actor with the robotic device, wherein the machine learning model is configured to determine the extent of engagement based on (i) the coordinates of each respective keypoint of the first keypoint subset and (ii) for each respective keypoint of the second keypoint subset, an indication that the respective keypoint is not visible in the image, wherein the machine learning model has been trained using a plurality of training images of a plurality of actors, wherein each respective training image of the plurality of training images is associated with a label indicating a corresponding extent of engagement, wherein each respective image has been captured by the camera or a second camera disposed on a second robotic device, and wherein the second camera approximates a perspective of the camera by being positioned on the second robotic device within a threshold height relative to a height of the camera on the robotic device. 2. The computer-implemented method of claim 1 , wherein the first image is a two-dimensional (2D) image. 3. The computer-implemented method of claim 1 , wherein the machine learning model is configured to determine the extent of engagement further based on, for each respective keypoint of the first keypoint subset, an indicator that the respective keypoint is visible in the image. 4. The computer-implemented method of claim 3 , wherein the indicator that the respective keypoint of the second keypoint subset is not visible in the image comprises a binary variable set to a first value, and wherein the indicator that the respective keypoint of the first keypoint subset is visible in the image comprises the binary variable set to a second value. 5. The computer-implemented method of claim 1 , wherein the indicator that the respective keypoint of the second keypoint subset is not visible in the image comprises coordinates of the respective keypoint set to a predetermined value. 6. The computer-implemented method of claim 1 , wherein the second camera further approximates the perspective of the camera by being positioned on the second robotic device within a threshold angular displacement relative to an angular position of the camera on the robotic device. 7. The computer-implemented method of claim 1 , further comprising: based on the extent of engagement of the actor with the robotic device, determining one or more operations to perform by the robotic device to interact with the actor; and executing the one or more operations. 8. The computer-implemented method of claim 1 , further comprising: based on the extent of engagement of the actor with the robotic device, determine at least one of (i) a start point of an interaction of the actor with the robotic device or (i) an end point of the interaction. 9. The computer-implemented method of claim 1 , wherein the image also represents a second body of a second actor, and wherein the method further comprises: determining, by way of the machine learning model, a second extent of engagement of the second actor with the robotic device; comparing the extent of engagement of the second actor with the robotic device to the extent of engagement of the actor with the robotic device; and determining, based on results of the comparing, a direction in which to orient the robotic device. 10. The computer-implemented method of claim 1 , wherein the extent of engagement of the actor with the robotic device is selected from a group comprising an engaged state, a borderline state, and a disengaged state. 11. The computer-implemented method of claim 10 , further comprising: determining, based on the extent of engagement of the actor with the robotic device, a transition from a current state to a next state, wherein, when the current state is the borderline state, a probability of transition to the next state is conditioned on a prior state such that (i) when the prior state is the engaged state, the next state is biased towards being the disengaged state and (ii) when the prior state is the disengaged state, the next state is biased towards being the engaged state. 12. The computer-implemented method of claim 10 , further comprising: when the extent of engagement of the actor with the robotic device comprises the borderline state, causing the robotic device to perform an operation indicating an intent to interact with the actor. 13. The computer-implemented method of claim 1 , wherein determining the coordinates of the respective keypoint within the image comprises: detecting the respective keypoint within the image based on a preceding image of the actor, an adjustment of a pose of the camera between the preceding image and the image, and the image. 14. The computer-implemented method of claim 1 , wherein the machine learning model is configured to determine the extent of engagement further based on one or more of: (i) an utterance by the actor detected by a microphone on the robotic device, (ii) a visual indication within the image that the actor is speaking, or (iii) a direction in which a gaze of the actor is pointed. 15. The computer-implemented method of claim 1 , further comprising: receiving, from the camera, a second image representing at least part of the body of the actor; determining, for each respective keypoint of a third keypoint subset of the plurality of keypoints, coordinates of the respective keypoint within the second image, wherein each respective keypoint of the third keypoint subset is visible in the second image; and determining a fourth keypoint subset of the plurality of keypoints, wherein each respective keypoint of the fourth keypoint subset is not visible in the second image, and wherein the machine learning model is configured to determine the extent of engagement further based on (i) the coordinates of each respective keypoint of the third keypoint subset, and (ii) for each respective keypoint of the fourth keypoint subset, an indicator that the respective keypoint is not visible in the second image. 16. The computer-implemented method of claim 1 , wherein the corresponding plurality of predetermined body locations comprise one or more locations on limbs, one or more locations on a torso, and one or more locations on a head. 17. A robotic device comprising: a camera; a processor; and a non-transitory computer readable medium having stored thereon instructions that, when executed by the processor, cause the robotic device to perform operations comprising: receiving, from the camera, an image representing at least part of a body of an actor; determining, for each respective keypoint of a first keypoint subset of a plurality of keypoints, coordinates of the respective keypoint within the second image, wherein the plurality of keypoints represent a corresponding plurality of predetermined body locations, and wherein each respect
Recognition of whole body movements, e.g. for sport training · CPC title
learning, adaptive, model based, rule based expert control · CPC title
Vision controlled systems · CPC title
the criterion being a learning criterion · CPC title
Extraction of image or video features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.