Generating numeric embeddings of images
US-9836641-B2 · Dec 5, 2017 · US
US10657359B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10657359-B2 |
| Application number | US-201715818124-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 20, 2017 |
| Priority date | Nov 20, 2017 |
| Publication date | May 19, 2020 |
| Grant date | May 19, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an object embedding system. In one aspect, a method comprises providing selected images as input to the object embedding system and generating corresponding embeddings, wherein the object embedding system comprises a thumbnailing neural network and an embedding neural network. The method further comprises backpropagating gradients based on a loss function to reduce the distance between embeddings for same instances of objects, and to increase the distance between embeddings for different instances of objects.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for end-to-end training of an object embedding system, the method comprising: iteratively training the object embedding system on a plurality of images, each of the images depicting an object of a particular type, each iteration of the training comprising: providing selected images as input to the object embedding system and generating corresponding embeddings, wherein the object embedding system comprises a thumbnailing neural network and an embedding neural network, wherein each neural network comprises a plurality of consecutive layers that are exclusive of each other, and wherein generating an embedding for an object depicted in an image using the object embedding system comprises: generating a thumbnail representation of the object depicted in the image as output of the thumbnailing neural network, wherein the thumbnailing neural network processes an input in accordance with values of a set of thumbnailing neural network parameters to: determine values of parameters of a spatial transformation that defines a correspondence between pixels of the thumbnail representation and pixels of the image; and generate as output the thumbnail representation using the spatial transformation and the image; generating an embedding by providing the thumbnail representation as input to the embedding neural network that is configured to process the thumbnail representation in accordance with values of a set of embedding neural network parameters to generate an embedding as output; determining gradients based on a loss function to reduce a distance between embeddings for same instances of objects, and to increase the distance between embeddings for different instances of objects; and adjusting the values of the set of thumbnailing neural network parameters and the values of the set of embedding neural network parameters using the gradients. 2. The computer-implemented method of claim 1 , wherein the object embedding system additionally comprises a detection neural network comprising a plurality of consecutive layers, and generating an embedding for an object depicted in an image using the object embedding system additionally comprises: generating an encoded representation of the image by providing the image as input to the detection neural network, wherein the detection neural network is configured to process the image in accordance with values of a set of detection neural network parameters to generate an encoded representation of the image; and providing the encoded representation of the image as input to the thumbnailing neural network. 3. The computer-implemented method of claim 2 , wherein the detection neural network is pre-trained to generate encoded representations of images comprising data identifying predicted locations of objects of the particular type in the image. 4. The computer-implemented method of claim 1 , wherein the embedding neural network is pre-trained based on thumbnail representations of objects of the particular type that are not generated by the thumbnailing neural network. 5. The computer-implemented method of claim 1 , wherein determining gradients based on the loss function additionally comprises, for each selected image: determining positions of key points of the thumbnail representation generated by the thumbnailing neural network; determining positions of the key points of the thumbnail representation in a frame of reference of the image; and reducing an error measure between positions of key points of the object of the particular type depicted in the image and the positions of the key points of the thumbnail representation in the frame of reference of the image. 6. The computer-implemented method of claim 5 , wherein the key points of the object of the particular type depicted in the image comprise vertices of a bounding box around the object of the particular type depicted in the image, and wherein the key points of the thumbnail representation comprise bounding vertices of the thumbnail representation. 7. The computer-implemented method of claim 5 , wherein: the error measure is a sum of errors between the positions of the key points of the object of the particular type depicted in the image and the positions of the key points of the thumbnail representation in the frame of reference of the image; and the error between a position of a key point of the object of the particular type depicted in the image and a corresponding position of a key point of the thumbnail representation in the frame of reference of the image is zero if a distance between them is less than a tolerance radius. 8. The computer-implemented method of claim 7 , wherein the tolerance radius is increased over the training iterations until it reaches a maximum threshold. 9. The computer-implemented method of claim 1 , wherein the spatial transformation of the thumbnailing neural network includes an image warping spatial transformation that defines a correspondence between the pixels of the thumbnail representation and the pixels of the image according to a displacement vector at each pixel of the thumbnail representation. 10. The computer-implemented method of claim 9 , wherein the spatial transformation of the thumbnailing neural network is a composition of an affine spatial transformation and the image warping spatial transformation. 11. The computer-implemented method of claim 1 , wherein the objects of the particular type are faces. 12. A computer-implemented method for identifying objects in images, the method comprising: providing an image as input to an object embedding system trained using the computer-implemented method of claim 1 ; and receiving as output an embedding vector which is indicative of an object in the image. 13. The computer-implemented method of claim 12 , wherein the object embedding system is trained to generate embeddings of faces and wherein the object in the image is a face, the method further comprising: comparing the embedding vector to one or more reference embedding vectors, each associated with a different face, thereby to identify the face in the input image. 14. One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for end-to-end training of an object embedding system, the operations comprising: iteratively training the object embedding system on a plurality of images, each of the images depicting an object of a particular type, each iteration of the training comprising: providing selected images as input to the object embedding system and generating corresponding embeddings, wherein the object embedding system comprises a thumbnailing neural network and an embedding neural network, wherein each neural network comprises a plurality of consecutive layers that are exclusive of each other, and wherein generating an embedding for an object depicted in an image using the object embedding system comprises: generating a thumbnail representation of the object depicted in the image as output of the thumbnailing neural network, wherein the thumbnailing neural network processes an input in accordance with values of a set of thumbnailing neural network parameters to: determine values of parameters of a spatial transformation that defines a correspondence between pixels of the thumbnail representation and pixels of the image; and generate as output the thumbnail representation using the spatial transformation and the image; generating an embedding by providing the thumbnail representation as input to the embedding neural network that is con
Backpropagation, e.g. using gradient descent · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.