What technology area does this patent fall under?

Primary CPC classification G06K9/00248. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

3D visual proxemics: recognizing human interactions in 3D from a single image

US9268994B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9268994-B2
Application number	US-201313967521-A
Country	US
Kind code	B2
Filing date	Aug 15, 2013
Priority date	Mar 15, 2013
Publication date	Feb 23, 2016
Grant date	Feb 23, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A unified framework detects and classifies people interactions in unconstrained user generated images. Previous approaches directly map people/face locations in two-dimensional image space into features for classification. Among other things, the disclosed framework estimates a camera viewpoint and people positions in 3D space and then extracts spatial configuration features from explicit three-dimensional people positions.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; and classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; wherein the image is created by a camera positioned at a camera viewpoint relative to a reference plane, and the method comprises estimating the camera viewpoint and using the estimated camera viewpoint to classify the image. 2. The method of claim 1 , comprising detecting, in the image, a person standing in front of another person by applying a proxemics-based visibility constraint. 3. The method of claim 1 , comprising detecting, in the image, a child and an adult by applying a proxemics-based localized pose constraint. 4. The method of claim 1 , comprising classifying the image as depicting a group interaction, a family photo, a group photo, a couple with an audience, a crowd scene, or a speaker and an audience. 5. The method of claim 1 , comprising detecting a plurality of feature cues in the image, wherein each of the feature cues relates to a proxemics-based attribute. 6. The method of claim 5 , wherein the plurality of feature cues comprises a shape cue that indicates a shape of the spatial arrangement of the detected face locations, a shot composition cue that indicates a visual distribution of the people depicted in the image, a distance cue that measures distances between the detected face locations in the image, a camera pose cue that estimates the height of the camera used to capture the image in relation to the people depicted in the image relative to a ground plane, and a shape layer cue that indicates whether the people depicted in the image are arranged in a single group or in separate subgroups. 7. The method of claim 1 , comprising creating a collection of classified images by repeating the detecting, determining, performing, and classifying for a plurality of two-dimensional images and arranging the classified images in a collection according to human interaction type. 8. The method of claim 7 , comprising searching the collection using search criteria including a human interaction type. 9. The method of claim 7 , comprising retrieving an image from the collection based on a human interaction type. 10. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; and classifying a camera viewpoint as a high-angle viewpoint, an eye-level viewpoint, or a low-angle viewpoint. 11. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; and analyzing the plurality of detected human face locations using a linear camera model, identifying a face location that does not fit the linear camera model as an outlier, identifying a face location that fits the linear camera model as an inlier, determining the position of the outlier in relation to the inlier, and classifying the image as depicting a type of human interaction based on the position of the outlier in relation to the inlier. 12. The method of claim 11 , comprising analyzing the position of the outlier in relation to the inlier using one or more visual proxemics-based constraints. 13. A method for recognizing a human interaction depicted in a two-dimensional image, the method comprising, algorithmically: detecting a plurality of human face locations of people depicted in the image; determining a three-dimensional spatial arrangement of the people depicted in the image based on the detected human face locations; performing a proxemics-based analysis of the three-dimensional spatial arrangement of the people depicted in the image, wherein the proxemics-based analysis identifies cues in the three-dimensional spatial arrangement that are relevant to human interactions; classifying the image as depicting a type of human interaction using visual proxemes, wherein the visual proxemes comprise a set of prototypical patterns that represent commonly occurring people interactions; and alternating between estimating a camera parameter of the camera used to create the image and applying proxemics-based constraints to the three-dimensional spatial arrangement of the human face locations detected in the image to identify the type of human interaction depicted by the image.

Assignees

Stanford Res Inst Int

Inventors

Classifications

G06K9/00677
Physics · mapped topic
G06K9/00248Primary
Physics · mapped topic
G06K9/00221
Physics · mapped topic
G06V40/165Primary
using facial parts and geometric relationships · CPC title
G06V20/30
in albums, collections or shared content, e.g. social network photos or video · CPC title

Patent family

Related publications grouped by family.

View patent family 51527219

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9268994B2 cover?: A unified framework detects and classifies people interactions in unconstrained user generated images. Previous approaches directly map people/face locations in two-dimensional image space into features for classification. Among other things, the disclosed framework estimates a camera viewpoint and people positions in 3D space and then extracts spatial configuration features from explicit three…
Who is the assignee on this patent?: Stanford Res Inst Int
What technology area does this patent fall under?: Primary CPC classification G06K9/00248. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).