Category learning neural networks

US10635979B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10635979-B2
Application numberUS-201916511637-A
CountryUS
Kind codeB2
Filing dateJul 15, 2019
Priority dateJul 20, 2018
Publication dateApr 28, 2020
Grant dateApr 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a clustering of images into a plurality of semantic categories. In one aspect, a method comprises: training a categorization neural network, comprising, at each of a plurality of iterations: processing an image depicting an object using the categorization neural network to generate (i) a current prediction for whether the image depicts an object or a background region, and (ii) a current embedding of the image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; and determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the clustering loss depends on a similarity between the current embedding of the image and the current cluster centers.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: training a categorization neural network to determine trained values of the categorization neural network parameters from initial values of the categorization neural network parameters, comprising, at each of a plurality of iterations: processing a given image depicting an object using the categorization neural network in accordance with current values of categorization neural network parameters to generate an output comprising: (i) a current prediction for whether the given image depicts an object or a background region, and (ii) a current embedding of the given image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the classification loss depends on the current prediction for whether the given image depicts an object or a background region, and wherein the clustering loss depends on a similarity between the current embedding of the given image and the current cluster centers; and determining an update to the current values of the categorization neural network parameters from the gradient; determining a plurality of final cluster centers based on the trained values of the categorization neural network parameters; for each of a plurality of target images, processing the target image using the categorization neural network in accordance with the trained values of the categorization neural network parameters to generate a final embedding of the target image; and determining a clustering of the target images into a plurality of semantic categories using the final embeddings of the target images and the final cluster centers. 2. The method of claim 1 , wherein the given image is generated by a plurality of operations comprising: generating a depth-augmented training image by determining a depth associated with each pixel in a training image; clustering the pixels of the depth-augmented training image using: (i) the intensity data associated with the pixels of the training image, and (ii) the depths of the pixels of the training image; and generating the given image based on the clustering of the pixels of the depth-augmented training image. 3. The method of claim 2 , wherein determining a depth associated with each pixel in the training image comprises: processing the training image by using a depth estimation neural network in accordance with trained values of depth estimation neural network parameters to generate an output comprising a depth associated with each pixel in the training image. 4. The method of claim 3 , wherein the depth estimation neural network is trained using an unsupervised machine learning training technique. 5. The method of claim 2 , wherein generating the given image using the clustering of the pixels of the depth-augmented training image comprises: cropping the given image from the training image based on the clustering of the pixels of the depth-augmented training image. 6. The method of claim 1 , wherein determining the current cluster centers based on the current values of the categorization neural network parameters comprises: obtaining the current cluster centers from a memory unit of the categorization neural network. 7. The method of claim 1 , wherein the clustering loss comprises: a minimum over each current cluster center of a difference between the current cluster center and the current embedding of the given image. 8. The method of claim 1 , wherein the clustering loss further comprises: a measure of how evenly given images are distributed between the current cluster centers. 9. The method of claim 1 , wherein determining the clustering of the target images into the plurality of semantic categories using the final embeddings of the target images and the final cluster centers comprises: for each target image, assigning the target image to a closest final cluster center to the final embedding of the target image; and for each final cluster center, determining the target images assigned to the final cluster center as belonging to a same semantic category. 10. The method of claim 1 , wherein the categorization neural network comprises a plurality of convolutional neural network layers. 11. The method of claim 1 , wherein the current embedding of the given image is an intermediate output of the categorization neural network. 12. The method of claim 1 further comprising, at each of a plurality of training iterations: processing a selected image depicting a background region using the categorization neural network in accordance with current values of categorization neural network parameters to generate an output comprising a current prediction for whether the selected image depicts an object or a background region; determining a gradient of an objective function that includes the classification loss; and determining an update to the current values of the categorization neural network parameters from the gradient. 13. A system, comprising: one or more computers; a memory in data communication with the one or more computers and storing instructions that cause the one or more computers to perform operations comprising: training a categorization neural network to determine trained values of the categorization neural network parameters from initial values of the categorization neural network parameters, comprising, at each of a plurality of iterations: processing a given image depicting an object using the categorization neural network in accordance with current values of categorization neural network parameters to generate an output comprising: (i) a current prediction for whether the given image depicts an object or a background region, and (ii) a current embedding of the given image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the classification loss depends on the current prediction for whether the given image depicts an object or a background region, and wherein the clustering loss depends on a similarity between the current embedding of the given image and the current cluster centers; and determining an update to the current values of the categorization neural network parameters from the gradient; determining a plurality of final cluster centers based on the trained values of the categorization neural network parameters; for each of a plurality of target images, processing the target image using the categorization neural network in accordance with the trained values of the categorization neural network parameters to generate a final embedding of the target image; and determining a clustering of the target images into a plurality of semantic categories using the final embeddings of the target images and the final cluster centers. 14. The system of claim 13 , wherein the given image is generated by a plurality of operations comprising: generating a depth-augmented training image by determining a depth associated with each pixel in a training image; clustering the pixels of the depth-augmented training image using: (i) the intensity data associated with the pixels of the training image, and (ii) the depths of the pixels of the training image; and generating the given image based on the clustering

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10635979B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a clustering of images into a plurality of semantic categories. In one aspect, a method comprises: training a categorization neural network, comprising, at each of a plurality of iterations: processing an image depicting an object using the categorization neural network to generate…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).