Object recognition neural network training using multiple data sources

US12430903B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12430903-B2
Application numberUS-202118007288-A
CountryUS
Kind codeB2
Filing dateJul 28, 2021
Priority dateJul 29, 2020
Publication dateSep 30, 2025
Grant dateSep 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an object recognition neural network using multiple data sources. One of the methods includes receiving training data that includes a plurality of training images from a first source and images from a second source. A set of training images are obtained from the training data. For each training image in the set of training images, contrast equalization is applied to the training image to generate a modified image. The modified image is processed using the neural network to generate an object recognition output for the modified image. A loss is determined based on errors between, for each training image in the set, the object recognition output for the modified image generated from the training image and ground-truth annotation for the training image. Parameters of the neural network are updated based on the determined loss.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for neural network training, comprising: receiving training data that comprises a plurality of training images and, for each image, a respective ground-truth annotation, the plurality of training images comprising images from a first source and images from a second source; obtaining a set of training images from the training data; for each training image in the set of training images: applying contrast equalization to the training image to generate a modified image; and processing the modified image using a neural network to generate an object recognition output for the modified image; determining, as a determined loss, a loss based on errors between, for each training image in the set of training images, the object recognition output for the modified image generated from the training image and the respective ground-truth annotation for the training image; and updating parameters of the neural network based on the determined loss. 2. The computer-implemented method of claim 1 , wherein obtaining a set of training images from the training data, comprises: sampling an initial set of images from the training data; and generating the set of training images by discarding one or more images from the initial set of images. 3. The computer-implemented method of claim 2 , wherein generating the set of training images comprises: determining that the one or more images in the initial set of images have motion blur; and in response, discarding the one or more images that have motion blur. 4. The computer-implemented method of claim 2 , wherein generating the set of training images comprises: determining, from respective ground-truth annotations for the training images in the initial set of images, that one or more of the images in the initial set of images depict objects that do not belong to a relevant object category; and in response, discarding the one or more images that depict objects that do not belong to a relevant object category. 5. The computer-implemented method of claim 2 , wherein generating the set of training images comprises: determining that one or more of the images in the initial set of images depict an object that is truncated or occluded; and in response, discarding the one or more images that depict an object that is truncated or occluded. 6. The computer-implemented method of claim 5 , wherein determining that one or more of the images in the set of training images depict an object that is truncated or occluded comprises: obtaining, from respective ground-truth annotations for the training images in the initial set of images, truncation scores or occlusion scores previously computed based on the respective ground-truth annotations, and wherein computing the truncation scores or occlusion scores comprising: obtaining, from the respective ground-truth annotations, a three-dimensional (3-D) bounding box and a two-dimensional (2-D) bounding box for an object in a training image from the initial set of images; generating a projected 2-D bounding box by projecting the 3-D bounding box to the training image; and computing a truncation score or an occlusion score using an overlap between the projected 2-D bounding box and the 2-D bounding box from the respective ground-truth annotations; and determining, based on the truncation scores or occlusion scores, that one or more of the images in the initial set of images depict an object that is truncated or occluded. 7. The computer-implemented method of claim 1 , wherein determining the loss comprises: for each training image in the set of training images: determining a count of images from the set of training images that have a same ground-truth annotation as the training image; determining, based on the count of images, a weight for the training image; and generating, from an error between the object recognition output for the modified image generated from the training image and the respective ground-truth annotation for the training image, a weighted error based on the weight for the training image. 8. The computer-implemented method of claim 7 , comprising: determining the loss based on weighted errors for training images in the set of training images. 9. The computer-implemented method of claim 7 , wherein the respective ground-truth annotation for the training image depicts an object that belongs to a k-th object category among K object categories, and wherein the weight w k for the training image is w k = 1 + 2 * ( 1 - c k c max ) , where c k is the count of images from the set of training images that has the same ground-truth annotation as the training image, and c max is a maximum value of all values among counts of images c i ,i=1, . . . , K. 10. The computer-implemented method of claim 1 , wherein the first source is a set of real-world images and the second source is a set of synthetic images. 11. The computer-implemented method of claim 1 , wherein the object recognition output comprises: a bounding box, and a localization score that is a prediction of an intersection-over-union overlap between the bounding box and a ground-truth bounding box. 12. The computer-implemented method of claim 1 , wherein the object recognition output comprises: an instance mask, and a mask score that is a prediction of an intersection-over-union overlap between the instance mask and a ground-truth instance mask. 13. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising: receiving training data that comprises a plurality of training images and, for each image, a respective ground-truth annotation, the plurality of training images comprising images from a first source and images from a second source; obtaining a set of training images from the training data; for each training image in the set of training images: applying contrast equalization to the training image to generate a modified image; and processing the modified image using a neural network to generate an object recognition output for the modified image; determining, as a determined loss, a loss based on errors between, for each training image in the set of training images, the object recognition output for the modified image generated from the training image and the respective ground-truth annotation for the training image; and updating parameters of the neural network based on the determined loss. 14. The non-transitory, computer-readable medium of claim 13 , wherein obtaining a set of training images from the training data, comprises: sampling an initial set of images from the training data; and generating the set of training images by discarding one or more training images from t

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Combinations of networks · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12430903B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an object recognition neural network using multiple data sources. One of the methods includes receiving training data that includes a plurality of training images from a first source and images from a second source. A set of training images are obtained from the training data. For eac…
Who is the assignee on this patent?
Magic Leap Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).