Object detection and representation in images

US10452954B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10452954-B2
Application numberUS-201715704746-A
CountryUS
Kind codeB2
Filing dateSep 14, 2017
Priority dateSep 14, 2017
Publication dateOct 22, 2019
Grant dateOct 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection and representation in images. In one aspect, a method includes detecting occurrences of objects of a particular type in images captured within a first duration of time, and iteratively training an image embedding function to produce as output representations of features of the input images depicting occurrences of objects of the particular type, where similar representations of features are generated for images that depict the same instance of an object of a particular type captured within a specified duration of time, and dissimilar representations of features are generated for images that depict different instances of objects of the particular type.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method performed by data-processing apparatus, the method comprising: detecting occurrences of objects of a particular type in each image in a training set of images, wherein the images in the training set of images are images that have been captured within a first duration of time; extracting sub-images from the images in the training set of images, wherein each sub-image depicts one occurrence of a detected object; iteratively training an image embedding function, wherein the image embedding function comprises a set of parameter weights that operate on an input image to produce as output a representation of features of the input image, each iteration of the training comprising: selecting a plurality of image pairs of a first type and a plurality of image pairs of a second type from the extracted sub-images, each image pair being a combination of a first sub-image and a second sub-image, wherein: each image pair of the first type comprises a first sub-image and a second sub-image that depict a same instance of an object of the particular type; each image pair of the second type comprises a first sub-image and a second sub-image that depict different instances of objects of the particular type; and for each image pair of the first type: the first sub-image of the image pair of the first type was extracted from a corresponding first image and the second sub-image of the image pair of the first type was extracted from a corresponding second image, wherein a duration of time that elapsed between when the first image and the second image corresponding to the image pair of the first type were captured is shorter than a second duration of time, wherein the second duration of time is shorter than the first duration of time; and the duration of time that elapsed between when the first image and the second image corresponding to the image pair of the first type were captured is different than the respective duration of time that elapsed between when the first image and the second image corresponding to each other selected image pair of the first type were captured; providing each selected image pair as input to the image embedding function and generating corresponding outputs; determining a performance measure of the image embedding function; adjusting the parameter weights of the image embedding function based on the performance measure; and performing another iteration of the training until a cessation event occurs. 2. The computer-implemented method of claim 1 , wherein selecting image pairs of the first type and of the second type comprises selecting image triplets, each image triplet being a combination of a first sub-image, a second sub-image, and a third sub-image, wherein: the image pair comprising the first sub-image and second sub-image is an image pair of the first type; and the image pair comprising the first sub-image and the third sub-image is an image pair of the second type. 3. The computer-implemented method of claim 2 , wherein providing an image triplet as input to the image embedding function and generating corresponding outputs comprises generating, by the image embedding function, a first representation of the features of the first image in the image triplet, a second representation of the features of the second image in the image triplet, and a third representation of the features of the third image in the image triplet. 4. The computer-implemented method of claim 3 , wherein determining the performance measure of the image embedding includes, for each selected image triplet: determining, based on the first representation of features and the second representation of features, a first similarity measure that measures a similarity of the first representation of features to the second representation of features; and determining, based on the first representation of features and the third representation of features, a second similarity measure that measures a similarity of the first representation of features to the third representation of features. 5. The computer-implemented method of claim 4 , wherein: the image embedding function generates mappings of input images in Euclidean space as output feature representations; and for each selected image triplet: determining the first similarity measure comprises determining a first Euclidean distance between the first representation of the features and the second representation of features; and determining the second similarity measure comprises determining a second Euclidean distance between the first representation of the features and the third representation of features. 6. The computer implemented method of claim 5 , wherein determining a performance measure of the image embedding function is based on the first Euclidean distance and the second Euclidean distance for each selected image triplet. 7. The computer-implemented method of claim 6 , wherein determining the performance measure based on the first Euclidean distance and the second Euclidean distance for each selected image triplet comprises determining a hinge loss based on a difference of the first Euclidean distance and the second Euclidean distance for each selected image triplet. 8. The computer-implemented method of claim 7 , wherein determining the performance measure comprises summing the hinge losses for the selected image triplets. 9. The computer-implemented method of claim 1 , wherein the image embedding function comprises a convolutional neural network. 10. The computer-implemented method of claim 1 , wherein the objects are full human bodies. 11. The computer-implemented method of claim 1 , wherein the feature representations generated by the trained neural network are assigned to groups using a clustering algorithm. 12. The computer-implemented method of claim 1 , wherein: each extracted sub-image is annotated with key points; the output of the image embedding function comprises the feature representation of the input image and predicted key point annotations of the input image; and determining the performance measure of the image embedding function includes determining a similarity between the predicted key point annotations and the key point annotations. 13. The computer-implemented method of claim 1 , wherein extracting sub-images from the images in the training set of images further comprises: annotating each sub-image with key points; selecting a particular sub-image as a reference image; and transforming each sub-image to align its key points with the key points of the reference sub-image. 14. The computer-implemented method of claim 1 , wherein: each extracted sub-image is annotated with key points; the input of the image embedding function comprises an input image and annotated key points of the input image; and providing each selected image pair as input to the image embedding function further comprises providing the annotated key points of each sub-image in each selected image pair as inputs to the image embedding function. 15. The computer-implemented method of claim 1 , wherein for an image pair of the first type, the first sub-image and the second sub-image are selected based the images from which they are extracted being captured during a particular event. 16. The computer-implemented method of claim 1 , wherein for an image pair of the first type, the first sub-image and the second sub-image are selected based on the images from which they are extracted being captured within the second duration of time. 17. A system comprising: one or more comput

Assignees

Inventors

Classifications

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • using neural networks · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Proximity, similarity or dissimilarity measures · CPC title

  • nonlinear criteria, e.g. embedding a manifold in a Euclidean space · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10452954B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection and representation in images. In one aspect, a method includes detecting occurrences of objects of a particular type in images captured within a first duration of time, and iteratively training an image embedding function to produce as output representations of features of the…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/5854. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).