3D visual proxemics: recognizing human interactions in 3D from a single image

US9268994B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9268994-B2
Application numberUS-201313967521-A
CountryUS
Kind codeB2
Filing dateAug 15, 2013
Priority dateMar 15, 2013
Publication dateFeb 23, 2016
Grant dateFeb 23, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A unified framework detects and classifies people interactions in unconstrained user generated images. Previous approaches directly map people/face locations in two-dimensional image space into features for classification. Among other things, the disclosed framework estimates a camera viewpoint and people positions in 3D space and then extracts spatial configuration features from explicit three-dimensional people positions.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; and classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; wherein the image is created by a camera positioned at a camera viewpoint relative to a reference plane, and the method comprises estimating the camera viewpoint and using the estimated camera viewpoint to classify the image. 2. The method of claim 1 , comprising detecting, in the image, a person standing in front of another person by applying a proxemics-based visibility constraint. 3. The method of claim 1 , comprising detecting, in the image, a child and an adult by applying a proxemics-based localized pose constraint. 4. The method of claim 1 , comprising classifying the image as depicting a group interaction, a family photo, a group photo, a couple with an audience, a crowd scene, or a speaker and an audience. 5. The method of claim 1 , comprising detecting a plurality of feature cues in the image, wherein each of the feature cues relates to a proxemics-based attribute. 6. The method of claim 5 , wherein the plurality of feature cues comprises a shape cue that indicates a shape of the spatial arrangement of the detected face locations, a shot composition cue that indicates a visual distribution of the people depicted in the image, a distance cue that measures distances between the detected face locations in the image, a camera pose cue that estimates the height of the camera used to capture the image in relation to the people depicted in the image relative to a ground plane, and a shape layer cue that indicates whether the people depicted in the image are arranged in a single group or in separate subgroups. 7. The method of claim 1 , comprising creating a collection of classified images by repeating the detecting, determining, performing, and classifying for a plurality of two-dimensional images and arranging the classified images in a collection according to human interaction type. 8. The method of claim 7 , comprising searching the collection using search criteria including a human interaction type. 9. The method of claim 7 , comprising retrieving an image from the collection based on a human interaction type. 10. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; and classifying a camera viewpoint as a high-angle viewpoint, an eye-level viewpoint, or a low-angle viewpoint. 11. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; and analyzing the plurality of detected human face locations using a linear camera model, identifying a face location that does not fit the linear camera model as an outlier, identifying a face location that fits the linear camera model as an inlier, determining the position of the outlier in relation to the inlier, and classifying the image as depicting a type of human interaction based on the position of the outlier in relation to the inlier. 12. The method of claim 11 , comprising analyzing the position of the outlier in relation to the inlier using one or more visual proxemics-based constraints. 13. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; and alternating between estimating a camera parameter of the camera used to create the image and applying proxemics-based constraints to the three-dimensional spatial arrangement of the human face locations detected in the image to identify the type of human interaction depicted by the image.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

  • G06V40/165Primary

    using facial parts and geometric relationships · CPC title

  • in albums, collections or shared content, e.g. social network photos or video · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9268994B2 cover?
A unified framework detects and classifies people interactions in unconstrained user generated images. Previous approaches directly map people/face locations in two-dimensional image space into features for classification. Among other things, the disclosed framework estimates a camera viewpoint and people positions in 3D space and then extracts spatial configuration features from explicit three…
Who is the assignee on this patent?
Stanford Res Inst Int
What technology area does this patent fall under?
Primary CPC classification G06K9/00248. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).