Automated Customization of Display Component Data for Search Results
US-2017097967-A1 · Apr 6, 2017 · US
US11106902B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11106902-B2 |
| Application number | US-201815920027-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 13, 2018 |
| Priority date | Mar 13, 2018 |
| Publication date | Aug 31, 2021 |
| Grant date | Aug 31, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Certain embodiments detect human-object interactions in image content. For example, human-object interaction metadata is applied to an input image, thereby identifying contact between a part of a depicted human and a part of a depicted object. Applying the human-object interaction metadata involves computing a joint-location heat map by applying a pose estimation subnet to the input image and a contact-point heat map by applying an object contact subnet to the to the input image. The human-object interaction metadata is generated by applying an interaction-detection subnet to the joint-location heat map and the contact-point heat map. The interaction-detection subnet is trained to identify an interaction based on joint-object contact pairs, where a joint-object contact pair includes a relationship between a human joint location and a contact point. An image search system or other computing system is provided with access to the input image having the human-object interaction metadata.
Opening claim text (preview).
The invention claimed is: 1. A method that includes one or more processing devices performing operations comprising: accessing, from a memory device, an input image; transforming the input image by applying human-object interaction metadata to the input image that identifies a part of a human depicted in the input image being in contact with a part of an object depicted in the input image, wherein applying the human-object interaction metadata comprises: providing the input image to an interaction detection network having a pose estimation subnet, an object contact subnet, and an interaction-detection subnet, computing a joint-location heat map for the input image using the pose estimation subnet, the joint-location heat map identifying one or more human joint locations in the input image, computing a contact-point heat map for the input image using the object contact subnet, the contact-point heat map identifying one or more contact points on the object depicted in the input image, combining the joint-location heat map and the contact-point heat map into a combined heat map image, wherein the combined heat map image includes a joint-object contact pair, and wherein the joint-object contact pair comprises a relationship between a human joint location and a contact point, and generating the human-object interaction metadata for the combined heat map image using the interaction-detection subnet, wherein the interaction-detection subnet is trained to identify an interaction based on the joint-object contact pair; and providing access to the input image having the human-object interaction metadata. 2. The method of claim 1 , the operations further comprising training the interaction-detection subnet, wherein training the interaction-detection subnet comprises: receiving, by the interaction-detection subnet, a plurality of training inputs, each training input comprising a training image and training human-object interaction metadata; computing, by the interaction-detection subnet, a calculated human-object interaction metadata from the plurality of training inputs; detecting, by a training computing system, a difference between the calculated human-object interaction metadata and the training human-object interaction metadata; and updating, by the training computing system, one or more parameters used to generate human-object interaction metadata, wherein updating the one or more parameters decreases a subsequent difference between the training human-object interaction metadata and subsequent human-interaction metadata calculated with the updated one or more parameters. 3. The method of claim 2 , the operations further comprising: receiving, through an input device, feedback from a user device indicating whether the human-object interaction metadata is accurate; and updating, by the training computing system, one or more parameters used to generate human-object interaction metadata based on the feedback. 4. The method of claim 1 , the operations further comprising servicing an image query using an image search system, wherein servicing the image query comprises: receiving, through an input device, the image query comprising one or more search terms; determining, by the image search system, a match score by comparing the human-object interaction metadata to the one or more search terms from the image query; and transmitting the input image to the image search system based on the match score exceeding a threshold. 5. The method of claim 1 , wherein generating the human-object interaction metadata further comprises determining that the joint-object contact pair identifies a location of the human joint location and a location of the contact point to be sufficiently similar. 6. The method of claim 5 , wherein determining that the joint-object contact pair to be sufficiently similar comprises determining that a dot product of the human joint location and the location of the contact point exceeds a threshold dot product. 7. The method of claim 1 , wherein the transforming the input image further comprises: providing the input image to an image evaluation network; and determining, by the image evaluation network, an estimated human-object interaction, wherein generating the human-object interaction metadata further comprises combining the estimated human-object interaction with the identification of the interaction by the interaction-detection subnet. 8. A system comprising: a processing device; and a non-transitory computer-readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer-readable medium and thereby perform operations comprising: transforming an input image by applying human-object interaction metadata to the input image that identifies a part of a human depicted in the input image being in contact with a part of an object depicted in the input image, wherein applying the human-object interaction metadata comprises: providing the input image to an interaction detection network having a pose estimation subnet, an object contact subnet, and an interaction-detection subnet, computing a joint-location heat map for the input image using the pose estimation subnet, the joint-location heat map identifying one or more human joint locations in the input image, computing a contact-point heat map for the input image using the object contact subnet, the contact-point heat map identifying one or more contact points on the object depicted in the input image, wherein the one or more contact points represent a subset of the object that the human can interact with, generating the human-object interaction metadata by: (i) applying the interaction-detection subnet to the joint-location heat map and the contact-point heat map, wherein the interaction-detection subnet is trained to identify an interaction based on joint-object contact pairs, and wherein a joint-object contact pair comprises a relationship between a human joint location and a contact point, and (ii) determining that the joint-object contact pair identifies a location of the human joint location and a location of the contact point to be sufficiently similar; and providing a computing system with access to the input image having the human-object interaction metadata. 9. The system of claim 8 , further comprising a training computing system configured for training the interaction-detection subnet, wherein training the interaction-detection subnet comprises: receiving, by the interaction-detection subnet, a plurality of training inputs, each training input comprising a training image and training human-object interaction metadata; computing, by the interaction-detection subnet, a calculated human-object interaction metadata from the plurality of training inputs; detecting a difference between the calculated human-object interaction metadata and the training human-object interaction metadata; and updating one or more parameters used to generate human-object interaction metadata, wherein updating the one or more parameters decreases a subsequent difference between the training human-object interaction metadata and subsequent human-interaction metadata calculated with the updated one or more parameters. 10. The system of claim 9 , the operations further comprising: receiving, via an input device, feedback from a user device indicating whether the human-object interaction metadata is accurate; and wherein the training computing system is further configured for updating one or more parameters used to generate human-object interaction metadata based on the feedback. 11. The system of claim 8 , further com
Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands · CPC title
Static body considered as a whole, e.g. static pedestrian or occupant recognition · CPC title
Combinations of networks · CPC title
using neural networks · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.