Interaction detection model for identifying human-object interactions in image content

US11106902B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11106902-B2
Application numberUS-201815920027-A
CountryUS
Kind codeB2
Filing dateMar 13, 2018
Priority dateMar 13, 2018
Publication dateAug 31, 2021
Grant dateAug 31, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Certain embodiments detect human-object interactions in image content. For example, human-object interaction metadata is applied to an input image, thereby identifying contact between a part of a depicted human and a part of a depicted object. Applying the human-object interaction metadata involves computing a joint-location heat map by applying a pose estimation subnet to the input image and a contact-point heat map by applying an object contact subnet to the to the input image. The human-object interaction metadata is generated by applying an interaction-detection subnet to the joint-location heat map and the contact-point heat map. The interaction-detection subnet is trained to identify an interaction based on joint-object contact pairs, where a joint-object contact pair includes a relationship between a human joint location and a contact point. An image search system or other computing system is provided with access to the input image having the human-object interaction metadata.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method that includes one or more processing devices performing operations comprising: accessing, from a memory device, an input image; transforming the input image by applying human-object interaction metadata to the input image that identifies a part of a human depicted in the input image being in contact with a part of an object depicted in the input image, wherein applying the human-object interaction metadata comprises: providing the input image to an interaction detection network having a pose estimation subnet, an object contact subnet, and an interaction-detection subnet, computing a joint-location heat map for the input image using the pose estimation subnet, the joint-location heat map identifying one or more human joint locations in the input image, computing a contact-point heat map for the input image using the object contact subnet, the contact-point heat map identifying one or more contact points on the object depicted in the input image, combining the joint-location heat map and the contact-point heat map into a combined heat map image, wherein the combined heat map image includes a joint-object contact pair, and wherein the joint-object contact pair comprises a relationship between a human joint location and a contact point, and generating the human-object interaction metadata for the combined heat map image using the interaction-detection subnet, wherein the interaction-detection subnet is trained to identify an interaction based on the joint-object contact pair; and providing access to the input image having the human-object interaction metadata. 2. The method of claim 1 , the operations further comprising training the interaction-detection subnet, wherein training the interaction-detection subnet comprises: receiving, by the interaction-detection subnet, a plurality of training inputs, each training input comprising a training image and training human-object interaction metadata; computing, by the interaction-detection subnet, a calculated human-object interaction metadata from the plurality of training inputs; detecting, by a training computing system, a difference between the calculated human-object interaction metadata and the training human-object interaction metadata; and updating, by the training computing system, one or more parameters used to generate human-object interaction metadata, wherein updating the one or more parameters decreases a subsequent difference between the training human-object interaction metadata and subsequent human-interaction metadata calculated with the updated one or more parameters. 3. The method of claim 2 , the operations further comprising: receiving, through an input device, feedback from a user device indicating whether the human-object interaction metadata is accurate; and updating, by the training computing system, one or more parameters used to generate human-object interaction metadata based on the feedback. 4. The method of claim 1 , the operations further comprising servicing an image query using an image search system, wherein servicing the image query comprises: receiving, through an input device, the image query comprising one or more search terms; determining, by the image search system, a match score by comparing the human-object interaction metadata to the one or more search terms from the image query; and transmitting the input image to the image search system based on the match score exceeding a threshold. 5. The method of claim 1 , wherein generating the human-object interaction metadata further comprises determining that the joint-object contact pair identifies a location of the human joint location and a location of the contact point to be sufficiently similar. 6. The method of claim 5 , wherein determining that the joint-object contact pair to be sufficiently similar comprises determining that a dot product of the human joint location and the location of the contact point exceeds a threshold dot product. 7. The method of claim 1 , wherein the transforming the input image further comprises: providing the input image to an image evaluation network; and determining, by the image evaluation network, an estimated human-object interaction, wherein generating the human-object interaction metadata further comprises combining the estimated human-object interaction with the identification of the interaction by the interaction-detection subnet. 8. A system comprising: a processing device; and a non-transitory computer-readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer-readable medium and thereby perform operations comprising: transforming an input image by applying human-object interaction metadata to the input image that identifies a part of a human depicted in the input image being in contact with a part of an object depicted in the input image, wherein applying the human-object interaction metadata comprises: providing the input image to an interaction detection network having a pose estimation subnet, an object contact subnet, and an interaction-detection subnet, computing a joint-location heat map for the input image using the pose estimation subnet, the joint-location heat map identifying one or more human joint locations in the input image, computing a contact-point heat map for the input image using the object contact subnet, the contact-point heat map identifying one or more contact points on the object depicted in the input image, wherein the one or more contact points represent a subset of the object that the human can interact with, generating the human-object interaction metadata by: (i) applying the interaction-detection subnet to the joint-location heat map and the contact-point heat map, wherein the interaction-detection subnet is trained to identify an interaction based on joint-object contact pairs, and wherein a joint-object contact pair comprises a relationship between a human joint location and a contact point, and (ii) determining that the joint-object contact pair identifies a location of the human joint location and a location of the contact point to be sufficiently similar; and providing a computing system with access to the input image having the human-object interaction metadata. 9. The system of claim 8 , further comprising a training computing system configured for training the interaction-detection subnet, wherein training the interaction-detection subnet comprises: receiving, by the interaction-detection subnet, a plurality of training inputs, each training input comprising a training image and training human-object interaction metadata; computing, by the interaction-detection subnet, a calculated human-object interaction metadata from the plurality of training inputs; detecting a difference between the calculated human-object interaction metadata and the training human-object interaction metadata; and updating one or more parameters used to generate human-object interaction metadata, wherein updating the one or more parameters decreases a subsequent difference between the training human-object interaction metadata and subsequent human-interaction metadata calculated with the updated one or more parameters. 10. The system of claim 9 , the operations further comprising: receiving, via an input device, feedback from a user device indicating whether the human-object interaction metadata is accurate; and wherein the training computing system is further configured for updating one or more parameters used to generate human-object interaction metadata based on the feedback. 11. The system of claim 8 , further com

Assignees

Inventors

Classifications

  • Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands · CPC title

  • Static body considered as a whole, e.g. static pedestrian or occupant recognition · CPC title

  • Combinations of networks · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11106902B2 cover?
Certain embodiments detect human-object interactions in image content. For example, human-object interaction metadata is applied to an input image, thereby identifying contact between a part of a depicted human and a part of a depicted object. Applying the human-object interaction metadata involves computing a joint-location heat map by applying a pose estimation subnet to the input image and a…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 31 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).