Joint training of neural networks using multi-scale hard example mining

US12154309B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12154309-B2
Application numberUS-202318462305-A
CountryUS
Kind codeB2
Filing dateSep 6, 2023
Priority dateApr 7, 2017
Publication dateNov 26, 2024
Grant dateNov 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example apparatus for mining multi-scale hard examples includes a convolutional neural network to receive a mini-batch of sample candidates and generate basic feature maps. The apparatus also includes a feature extractor and combiner to generate concatenated feature maps based on the basic feature maps and extract the concatenated feature maps for each of a plurality of received candidate boxes. The apparatus further includes a sample scorer and miner to score the candidate samples with multi-task loss scores and select candidate samples with multi-task loss scores exceeding a threshold score.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing object detection, the method comprising: generating, by executing a machine learning model using at least one processor, respective objectness scores for one or more regions of an image; selecting a first region of the one or more regions based on an objectness score in response to the first region meeting an objectness threshold; calculating a localization value for the first region; calculating a classification score for the first region; determining a multi-task loss score based on (a) the objectness score, (b) the localization value, and (c) the classification score, the multi-task loss score used to determine whether an object is contained in the first region of the image. 2. The method of claim 1 , further including generating an output detection result including the image and a bounding box representing the region including the detected object. 3. The method of claim 2 , wherein the bounding box is annotated with a classification of the detected object. 4. The method of claim 1 , further including upsampling the image to create the plurality of regions of the image. 5. The method of claim 4 , wherein the upsampling is performed using bi-linear interpolation. 6. The method of claim 1 , wherein the machine learning model includes a VGG-16 neural network. 7. An apparatus to detect an object in an image, the apparatus comprising: processor circuitry; and a storage device accessible by the processor circuitry, the storage device including machine readable instructions to cause the processor circuitry to: generate, using a machine learning model, respective objectness scores for one or more regions of the image; select a first region of the one or more regions based on an objectness score in response to the first region meeting an objectness threshold; calculate a localization value for the first region; calculate a classification score for the first region; determine a multi-task loss score based on (a) the objectness score, (b) the localization value, and (c) the classification score, the multi-task loss score used to determine whether an object is contained in the first region of the image. 8. The apparatus of claim 7 , wherein the processor is to generate an output detection result including the image and a bounding box representing the region including the detected object. 9. The apparatus of claim 8 , wherein the bounding box is annotated with a classification of the detected object. 10. The apparatus of claim 7 , wherein the processor is to upsample the image to create the plurality of regions of the image. 11. The apparatus of claim 10 , wherein the processor is to upsample the image using bi-linear interpolation. 12. The apparatus of claim 7 , wherein the machine learning model includes a VGG-16 neural network. 13. At least one non-transitory computer readable storage medium comprising instructions that, when executed, cause at least one processor to at least: generate, using a machine learning model, respective objectness scores for one or more regions of the image; select a first region of the one or more regions based on an objectness score in response to the first region meeting an objectness threshold; calculate a localization value for the first region; calculate a classification score for the first region; determine a multi-task loss score based on (a) the objectness score, (b) the localization value, and (c) the classification score, the multi-task loss score used to determine whether an object is contained in the first region of the image. 14. The at least one non-transitory computer readable storage medium of claim 13 , wherein the instructions cause the processor to generate an output detection result including the image and a bounding box representing the region including the detected object. 15. The at least one non-transitory computer readable storage medium of claim 14 , wherein the bounding box is annotated with a classification of the detected object. 16. The at least one non-transitory computer readable storage medium of claim 13 , wherein the instructions cause the processor to upsample the image to create the plurality of regions of the image. 17. The at least one non-transitory computer readable storage medium of claim 16 , wherein the instructions cause the processor to upsample the image using bi-linear interpolation. 18. The at least one non-transitory computer readable storage medium of claim 13 , wherein the machine learning model includes a VGG-16 neural network.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12154309B2 cover?
An example apparatus for mining multi-scale hard examples includes a convolutional neural network to receive a mini-batch of sample candidates and generate basic feature maps. The apparatus also includes a feature extractor and combiner to generate concatenated feature maps based on the basic feature maps and extract the concatenated feature maps for each of a plurality of received candidate bo…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06V10/454. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).