Generating numeric embeddings of images

US9836641B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9836641-B2
Application numberUS-201514972670-A
CountryUS
Kind codeB2
Filing dateDec 17, 2015
Priority dateDec 17, 2014
Publication dateDec 5, 2017
Grant dateDec 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating numeric embeddings of images. One of the methods includes obtaining training images; generating a plurality of triplets of training images; and training a neural network on each of the triplets to determine trained values of a plurality of parameters of the neural network, wherein training the neural network comprises, for each of the triplets: processing the anchor image in the triplet using the neural network to generate a numeric embedding of the anchor image; processing the positive image in the triplet using the neural network to generate a numeric embedding of the positive image; processing the negative image in the triplet using the neural network to generate a numeric embedding of the negative image; computing a triplet loss; and adjusting the current values of the parameters of the neural network using the triplet loss.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a plurality of training images, wherein the training images have been classified as images of objects of a particular object type; generating a plurality of triplets of training images, wherein each of the triplets comprises a respective anchor image, a respective positive image, and a respective negative image, and wherein, for each triplet, the anchor image and the positive image have both been classified as images of the same object of the particular object type and the negative image has been classified as an image of a different object of the particular object type; and training a neural network on each of the triplets to determine trained values of a plurality of parameters of the neural network, wherein the neural network is configured to receive an input image of an object of the particular object type and to process the input image to generate a numeric embedding of the input image, wherein training the neural network comprises, for each of the triplets: processing the anchor image in the triplet using the neural network in accordance with current values of the parameters of the neural network to generate a numeric embedding of the anchor image; processing the positive image in the triplet using the neural network in accordance with the current values of the parameters of the neural network to generate a numeric embedding of the positive image; processing the negative image in the triplet using the neural network in accordance with the current values of the parameters of the neural network to generate a numeric embedding of the negative image; computing a triplet loss from the numeric embedding of the anchor image, the positive image, and the negative image; and adjusting the current values of the parameters of the neural network to minimize the triplet loss such that distances between the numeric embedding of the input image of the object and respective numeric embeddings of other images of the same object is less than distances between the numeric embedding of the input image of the object and numeric embeddings of other images of other objects, wherein generating the plurality of triplets of training images, includes, for each of the triplets, generating the triplet such that i) the numeric embedding of the positive image in the triplet is farther from the numeric embedding of the anchor image in the triplet than any other numeric embedding of any other image of the same object as in the positive image and the anchor image, and ii) the numeric embedding of the negative image in the triplet is closer to the numeric embedding of the anchor image than any other number embedding of any other image of a different object as in the positive image and the anchor image. 2. The method of claim 1 , wherein the particular object type is faces of people. 3. The method of claim 1 , wherein the neural network is a deep convolutional neural network. 4. The method of claim 1 , wherein the neural network is configured to generate a vector of floating point values for the input image. 5. The method of claim 4 , further comprising: using the vector of floating point values as the numeric embedding of the input image. 6. The method of claim 4 , further comprising: normalizing the vector of floating point values to generate a normalized vector; and using the normalized vector as the numeric embedding of the input image. 7. The method of claim 4 , further comprising: normalizing the vector of floating point values to generate a normalized vector; quantizing the normalized vector to generate a quantized vector; and using the quantized vector as the numeric embedding of the input image. 8. The method of claim 1 , wherein the triplet loss satisfies, for each of the triplets: L =max(0,∥ f ( x a )− f ( x p )∥ 2 2 −∥f ( x a )− f ( x n )∥ 2 2 +α), wherein f(x a ) is the numeric embedding of the anchor image in the triplet, f(x p ) is the numeric embedding of the positive image in the triplet, f(x n ) is the numeric embedding of the negative image in the triplet, and α is a predetermined value. 9. The method of claim 1 , further comprising: receiving a first image and a second image; processing the first image using the neural network in accordance with the trained values of the parameters of the neural network to determine a numeric embedding of the first image; processing the second image using the neural network in accordance with the trained values of the parameters of the neural network to determine a numeric embedding of the second image; and determining whether the first image and the second image are images of the same object from a distance between the numeric embedding of the first image and the numeric embedding of the second image. 10. The method of claim 1 , further comprising: processing each of a plurality of images using the neural network in accordance with the trained values of the parameters of the neural network to determine a respective numeric embedding of each of the plurality of images; receiving a new image; processing the new image using the neural network in accordance with the trained values of the parameters of the neural network to determine a numeric embedding of the new image; and classifying the new images as being an image of the same object as one or more of the plurality of images from distances between the numeric embedding of the new image and numeric embeddings of images from the plurality of images. 11. The method of claim 1 , further comprising: processing each of a plurality of images using the neural network in accordance with the trained values of the parameters of the neural network to determine a respective numeric embedding of each of the plurality of images; clustering the numeric embedding of the plurality of images into a plurality of clusters; and for each cluster, classifying the images having numeric embeddings that are in the cluster as being images of the same object. 12. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining a plurality of training images, wherein the training images have been classified as images of objects of a particular object type; generating a plurality of triplets of training images, wherein each of the triplets comprises a respective anchor image, a respective positive image, and a respective negative image, and wherein, for each triplet, the anchor image and the positive image have both been classified as images of the same object of the particular object type and the negative image has been classified as an image of a different object of the particular object type; and training a neural network on each of the triplets to determine trained values of a plurality of parameters of the neural network, wherein the neural network is configured to receive an input image of an object of the particular object type and to process the input image to generate a numeric embedding of the input image, wherein training the neural network comprises, for each of the triplets: processing the anchor image in the triplet using the neural network in accordance with current values of the parameters of the neural network to generate a numeric embedding of the anchor image; processing the positive image in the triplet using the neural network in accordance with the current values of the parameters of the neural network to generate a numeric embedding of the positive image; processing the negative image in the triplet using the neural network in accordance wi

Assignees

Inventors

Classifications

  • Artificial neural networks [ANN] · CPC title

  • Training; Learning · CPC title

  • Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Classification, e.g. identification · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9836641B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating numeric embeddings of images. One of the methods includes obtaining training images; generating a plurality of triplets of training images; and training a neural network on each of the triplets to determine trained values of a plurality of parameters of the neural network, wherein trai…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).