Engagement detection and attention estimation for human-robot interaction

US11915523B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11915523-B2
Application numberUS-202217815361-A
CountryUS
Kind codeB2
Filing dateJul 27, 2022
Priority dateDec 9, 2019
Publication dateFeb 27, 2024
Grant dateFeb 27, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes receiving, from a camera disposed on a robotic device, a two-dimensional (2D) image of a body of an actor and determining, for each respective keypoint of a first subset of a plurality of keypoints, 2D coordinates of the respective keypoint within the 2D image. The plurality of keypoints represent body locations. Each respective keypoint of the first subset is visible in the 2D image. The method also includes determining a second subset of the plurality of keypoints. Each respective keypoint of the second subset is not visible in the 2D image. The method further includes determining, by way of a machine learning model, an extent of engagement of the actor with the robotic device based on (i) the 2D coordinates of keypoints of the first subset and (ii) for each respective keypoint of the second subset, an indicator that the respective keypoint is not visible.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, from a camera disposed on a robotic device, an image representing at least part of a body of an actor; determining, for each respective keypoint of a first keypoint subset of a plurality of keypoints, coordinates of the respective keypoint within the second image, wherein the plurality of keypoints represent a corresponding plurality of predetermined body locations, and wherein each respective keypoint of the first keypoint subset is visible in the image; determining a second keypoint subset of the plurality of keypoints, wherein each respective keypoint of the second keypoint subset is not visible in the image; and determining, by way of a machine learning model, an extent of engagement of the actor with the robotic device, wherein the machine learning model is configured to determine the extent of engagement based on (i) the coordinates of each respective keypoint of the first keypoint subset and (ii) for each respective keypoint of the second keypoint subset, an indication that the respective keypoint is not visible in the image, wherein the machine learning model has been trained using a plurality of training images of a plurality of actors, wherein each respective training image of the plurality of training images is associated with a label indicating a corresponding extent of engagement, wherein each respective image has been captured by the camera or a second camera disposed on a second robotic device, and wherein the second camera approximates a perspective of the camera by being positioned on the second robotic device within a threshold height relative to a height of the camera on the robotic device. 2. The computer-implemented method of claim 1 , wherein the first image is a two-dimensional (2D) image. 3. The computer-implemented method of claim 1 , wherein the machine learning model is configured to determine the extent of engagement further based on, for each respective keypoint of the first keypoint subset, an indicator that the respective keypoint is visible in the image. 4. The computer-implemented method of claim 3 , wherein the indicator that the respective keypoint of the second keypoint subset is not visible in the image comprises a binary variable set to a first value, and wherein the indicator that the respective keypoint of the first keypoint subset is visible in the image comprises the binary variable set to a second value. 5. The computer-implemented method of claim 1 , wherein the indicator that the respective keypoint of the second keypoint subset is not visible in the image comprises coordinates of the respective keypoint set to a predetermined value. 6. The computer-implemented method of claim 1 , wherein the second camera further approximates the perspective of the camera by being positioned on the second robotic device within a threshold angular displacement relative to an angular position of the camera on the robotic device. 7. The computer-implemented method of claim 1 , further comprising: based on the extent of engagement of the actor with the robotic device, determining one or more operations to perform by the robotic device to interact with the actor; and executing the one or more operations. 8. The computer-implemented method of claim 1 , further comprising: based on the extent of engagement of the actor with the robotic device, determine at least one of (i) a start point of an interaction of the actor with the robotic device or (i) an end point of the interaction. 9. The computer-implemented method of claim 1 , wherein the image also represents a second body of a second actor, and wherein the method further comprises: determining, by way of the machine learning model, a second extent of engagement of the second actor with the robotic device; comparing the extent of engagement of the second actor with the robotic device to the extent of engagement of the actor with the robotic device; and determining, based on results of the comparing, a direction in which to orient the robotic device. 10. The computer-implemented method of claim 1 , wherein the extent of engagement of the actor with the robotic device is selected from a group comprising an engaged state, a borderline state, and a disengaged state. 11. The computer-implemented method of claim 10 , further comprising: determining, based on the extent of engagement of the actor with the robotic device, a transition from a current state to a next state, wherein, when the current state is the borderline state, a probability of transition to the next state is conditioned on a prior state such that (i) when the prior state is the engaged state, the next state is biased towards being the disengaged state and (ii) when the prior state is the disengaged state, the next state is biased towards being the engaged state. 12. The computer-implemented method of claim 10 , further comprising: when the extent of engagement of the actor with the robotic device comprises the borderline state, causing the robotic device to perform an operation indicating an intent to interact with the actor. 13. The computer-implemented method of claim 1 , wherein determining the coordinates of the respective keypoint within the image comprises: detecting the respective keypoint within the image based on a preceding image of the actor, an adjustment of a pose of the camera between the preceding image and the image, and the image. 14. The computer-implemented method of claim 1 , wherein the machine learning model is configured to determine the extent of engagement further based on one or more of: (i) an utterance by the actor detected by a microphone on the robotic device, (ii) a visual indication within the image that the actor is speaking, or (iii) a direction in which a gaze of the actor is pointed. 15. The computer-implemented method of claim 1 , further comprising: receiving, from the camera, a second image representing at least part of the body of the actor; determining, for each respective keypoint of a third keypoint subset of the plurality of keypoints, coordinates of the respective keypoint within the second image, wherein each respective keypoint of the third keypoint subset is visible in the second image; and determining a fourth keypoint subset of the plurality of keypoints, wherein each respective keypoint of the fourth keypoint subset is not visible in the second image, and wherein the machine learning model is configured to determine the extent of engagement further based on (i) the coordinates of each respective keypoint of the third keypoint subset, and (ii) for each respective keypoint of the fourth keypoint subset, an indicator that the respective keypoint is not visible in the second image. 16. The computer-implemented method of claim 1 , wherein the corresponding plurality of predetermined body locations comprise one or more locations on limbs, one or more locations on a torso, and one or more locations on a head. 17. A robotic device comprising: a camera; a processor; and a non-transitory computer readable medium having stored thereon instructions that, when executed by the processor, cause the robotic device to perform operations comprising: receiving, from the camera, an image representing at least part of a body of an actor; determining, for each respective keypoint of a first keypoint subset of a plurality of keypoints, coordinates of the respective keypoint within the second image, wherein the plurality of keypoints represent a corresponding plurality of predetermined body locations, and wherein each respect

Assignees

Inventors

Classifications

  • G06V40/23Primary

    Recognition of whole body movements, e.g. for sport training · CPC title

  • learning, adaptive, model based, rule based expert control · CPC title

  • Vision controlled systems · CPC title

  • the criterion being a learning criterion · CPC title

  • Extraction of image or video features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11915523B2 cover?
A method includes receiving, from a camera disposed on a robotic device, a two-dimensional (2D) image of a body of an actor and determining, for each respective keypoint of a first subset of a plurality of keypoints, 2D coordinates of the respective keypoint within the 2D image. The plurality of keypoints represent body locations. Each respective keypoint of the first subset is visible in the 2…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06V40/23. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).