Automated and adaptive design and training of neural networks
US-2021287089-A1 · Sep 16, 2021 · US
US12430903B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12430903-B2 |
| Application number | US-202118007288-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 28, 2021 |
| Priority date | Jul 29, 2020 |
| Publication date | Sep 30, 2025 |
| Grant date | Sep 30, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an object recognition neural network using multiple data sources. One of the methods includes receiving training data that includes a plurality of training images from a first source and images from a second source. A set of training images are obtained from the training data. For each training image in the set of training images, contrast equalization is applied to the training image to generate a modified image. The modified image is processed using the neural network to generate an object recognition output for the modified image. A loss is determined based on errors between, for each training image in the set, the object recognition output for the modified image generated from the training image and ground-truth annotation for the training image. Parameters of the neural network are updated based on the determined loss.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for neural network training, comprising: receiving training data that comprises a plurality of training images and, for each image, a respective ground-truth annotation, the plurality of training images comprising images from a first source and images from a second source; obtaining a set of training images from the training data; for each training image in the set of training images: applying contrast equalization to the training image to generate a modified image; and processing the modified image using a neural network to generate an object recognition output for the modified image; determining, as a determined loss, a loss based on errors between, for each training image in the set of training images, the object recognition output for the modified image generated from the training image and the respective ground-truth annotation for the training image; and updating parameters of the neural network based on the determined loss. 2. The computer-implemented method of claim 1 , wherein obtaining a set of training images from the training data, comprises: sampling an initial set of images from the training data; and generating the set of training images by discarding one or more images from the initial set of images. 3. The computer-implemented method of claim 2 , wherein generating the set of training images comprises: determining that the one or more images in the initial set of images have motion blur; and in response, discarding the one or more images that have motion blur. 4. The computer-implemented method of claim 2 , wherein generating the set of training images comprises: determining, from respective ground-truth annotations for the training images in the initial set of images, that one or more of the images in the initial set of images depict objects that do not belong to a relevant object category; and in response, discarding the one or more images that depict objects that do not belong to a relevant object category. 5. The computer-implemented method of claim 2 , wherein generating the set of training images comprises: determining that one or more of the images in the initial set of images depict an object that is truncated or occluded; and in response, discarding the one or more images that depict an object that is truncated or occluded. 6. The computer-implemented method of claim 5 , wherein determining that one or more of the images in the set of training images depict an object that is truncated or occluded comprises: obtaining, from respective ground-truth annotations for the training images in the initial set of images, truncation scores or occlusion scores previously computed based on the respective ground-truth annotations, and wherein computing the truncation scores or occlusion scores comprising: obtaining, from the respective ground-truth annotations, a three-dimensional (3-D) bounding box and a two-dimensional (2-D) bounding box for an object in a training image from the initial set of images; generating a projected 2-D bounding box by projecting the 3-D bounding box to the training image; and computing a truncation score or an occlusion score using an overlap between the projected 2-D bounding box and the 2-D bounding box from the respective ground-truth annotations; and determining, based on the truncation scores or occlusion scores, that one or more of the images in the initial set of images depict an object that is truncated or occluded. 7. The computer-implemented method of claim 1 , wherein determining the loss comprises: for each training image in the set of training images: determining a count of images from the set of training images that have a same ground-truth annotation as the training image; determining, based on the count of images, a weight for the training image; and generating, from an error between the object recognition output for the modified image generated from the training image and the respective ground-truth annotation for the training image, a weighted error based on the weight for the training image. 8. The computer-implemented method of claim 7 , comprising: determining the loss based on weighted errors for training images in the set of training images. 9. The computer-implemented method of claim 7 , wherein the respective ground-truth annotation for the training image depicts an object that belongs to a k-th object category among K object categories, and wherein the weight w k for the training image is w k = 1 + 2 * ( 1 - c k c max ) , where c k is the count of images from the set of training images that has the same ground-truth annotation as the training image, and c max is a maximum value of all values among counts of images c i ,i=1, . . . , K. 10. The computer-implemented method of claim 1 , wherein the first source is a set of real-world images and the second source is a set of synthetic images. 11. The computer-implemented method of claim 1 , wherein the object recognition output comprises: a bounding box, and a localization score that is a prediction of an intersection-over-union overlap between the bounding box and a ground-truth bounding box. 12. The computer-implemented method of claim 1 , wherein the object recognition output comprises: an instance mask, and a mask score that is a prediction of an intersection-over-union overlap between the instance mask and a ground-truth instance mask. 13. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising: receiving training data that comprises a plurality of training images and, for each image, a respective ground-truth annotation, the plurality of training images comprising images from a first source and images from a second source; obtaining a set of training images from the training data; for each training image in the set of training images: applying contrast equalization to the training image to generate a modified image; and processing the modified image using a neural network to generate an object recognition output for the modified image; determining, as a determined loss, a loss based on errors between, for each training image in the set of training images, the object recognition output for the modified image generated from the training image and the respective ground-truth annotation for the training image; and updating parameters of the neural network based on the determined loss. 14. The non-transitory, computer-readable medium of claim 13 , wherein obtaining a set of training images from the training data, comprises: sampling an initial set of images from the training data; and generating the set of training images by discarding one or more training images from t
using classification, e.g. of video objects · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Combinations of networks · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.