Generating numeric embeddings of images
US-9836641-B2 · Dec 5, 2017 · US
US10452954B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10452954-B2 |
| Application number | US-201715704746-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 14, 2017 |
| Priority date | Sep 14, 2017 |
| Publication date | Oct 22, 2019 |
| Grant date | Oct 22, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection and representation in images. In one aspect, a method includes detecting occurrences of objects of a particular type in images captured within a first duration of time, and iteratively training an image embedding function to produce as output representations of features of the input images depicting occurrences of objects of the particular type, where similar representations of features are generated for images that depict the same instance of an object of a particular type captured within a specified duration of time, and dissimilar representations of features are generated for images that depict different instances of objects of the particular type.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method performed by data-processing apparatus, the method comprising: detecting occurrences of objects of a particular type in each image in a training set of images, wherein the images in the training set of images are images that have been captured within a first duration of time; extracting sub-images from the images in the training set of images, wherein each sub-image depicts one occurrence of a detected object; iteratively training an image embedding function, wherein the image embedding function comprises a set of parameter weights that operate on an input image to produce as output a representation of features of the input image, each iteration of the training comprising: selecting a plurality of image pairs of a first type and a plurality of image pairs of a second type from the extracted sub-images, each image pair being a combination of a first sub-image and a second sub-image, wherein: each image pair of the first type comprises a first sub-image and a second sub-image that depict a same instance of an object of the particular type; each image pair of the second type comprises a first sub-image and a second sub-image that depict different instances of objects of the particular type; and for each image pair of the first type: the first sub-image of the image pair of the first type was extracted from a corresponding first image and the second sub-image of the image pair of the first type was extracted from a corresponding second image, wherein a duration of time that elapsed between when the first image and the second image corresponding to the image pair of the first type were captured is shorter than a second duration of time, wherein the second duration of time is shorter than the first duration of time; and the duration of time that elapsed between when the first image and the second image corresponding to the image pair of the first type were captured is different than the respective duration of time that elapsed between when the first image and the second image corresponding to each other selected image pair of the first type were captured; providing each selected image pair as input to the image embedding function and generating corresponding outputs; determining a performance measure of the image embedding function; adjusting the parameter weights of the image embedding function based on the performance measure; and performing another iteration of the training until a cessation event occurs. 2. The computer-implemented method of claim 1 , wherein selecting image pairs of the first type and of the second type comprises selecting image triplets, each image triplet being a combination of a first sub-image, a second sub-image, and a third sub-image, wherein: the image pair comprising the first sub-image and second sub-image is an image pair of the first type; and the image pair comprising the first sub-image and the third sub-image is an image pair of the second type. 3. The computer-implemented method of claim 2 , wherein providing an image triplet as input to the image embedding function and generating corresponding outputs comprises generating, by the image embedding function, a first representation of the features of the first image in the image triplet, a second representation of the features of the second image in the image triplet, and a third representation of the features of the third image in the image triplet. 4. The computer-implemented method of claim 3 , wherein determining the performance measure of the image embedding includes, for each selected image triplet: determining, based on the first representation of features and the second representation of features, a first similarity measure that measures a similarity of the first representation of features to the second representation of features; and determining, based on the first representation of features and the third representation of features, a second similarity measure that measures a similarity of the first representation of features to the third representation of features. 5. The computer-implemented method of claim 4 , wherein: the image embedding function generates mappings of input images in Euclidean space as output feature representations; and for each selected image triplet: determining the first similarity measure comprises determining a first Euclidean distance between the first representation of the features and the second representation of features; and determining the second similarity measure comprises determining a second Euclidean distance between the first representation of the features and the third representation of features. 6. The computer implemented method of claim 5 , wherein determining a performance measure of the image embedding function is based on the first Euclidean distance and the second Euclidean distance for each selected image triplet. 7. The computer-implemented method of claim 6 , wherein determining the performance measure based on the first Euclidean distance and the second Euclidean distance for each selected image triplet comprises determining a hinge loss based on a difference of the first Euclidean distance and the second Euclidean distance for each selected image triplet. 8. The computer-implemented method of claim 7 , wherein determining the performance measure comprises summing the hinge losses for the selected image triplets. 9. The computer-implemented method of claim 1 , wherein the image embedding function comprises a convolutional neural network. 10. The computer-implemented method of claim 1 , wherein the objects are full human bodies. 11. The computer-implemented method of claim 1 , wherein the feature representations generated by the trained neural network are assigned to groups using a clustering algorithm. 12. The computer-implemented method of claim 1 , wherein: each extracted sub-image is annotated with key points; the output of the image embedding function comprises the feature representation of the input image and predicted key point annotations of the input image; and determining the performance measure of the image embedding function includes determining a similarity between the predicted key point annotations and the key point annotations. 13. The computer-implemented method of claim 1 , wherein extracting sub-images from the images in the training set of images further comprises: annotating each sub-image with key points; selecting a particular sub-image as a reference image; and transforming each sub-image to align its key points with the key points of the reference sub-image. 14. The computer-implemented method of claim 1 , wherein: each extracted sub-image is annotated with key points; the input of the image embedding function comprises an input image and annotated key points of the input image; and providing each selected image pair as input to the image embedding function further comprises providing the annotated key points of each sub-image in each selected image pair as inputs to the image embedding function. 15. The computer-implemented method of claim 1 , wherein for an image pair of the first type, the first sub-image and the second sub-image are selected based the images from which they are extracted being captured during a particular event. 16. The computer-implemented method of claim 1 , wherein for an image pair of the first type, the first sub-image and the second sub-image are selected based on the images from which they are extracted being captured within the second duration of time. 17. A system comprising: one or more comput
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
using neural networks · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
Proximity, similarity or dissimilarity measures · CPC title
nonlinear criteria, e.g. embedding a manifold in a Euclidean space · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.