Joint training of neural networks using multi-scale hard example mining
US-11790631-B2 · Oct 17, 2023 · US
US12154309B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12154309-B2 |
| Application number | US-202318462305-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 6, 2023 |
| Priority date | Apr 7, 2017 |
| Publication date | Nov 26, 2024 |
| Grant date | Nov 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An example apparatus for mining multi-scale hard examples includes a convolutional neural network to receive a mini-batch of sample candidates and generate basic feature maps. The apparatus also includes a feature extractor and combiner to generate concatenated feature maps based on the basic feature maps and extract the concatenated feature maps for each of a plurality of received candidate boxes. The apparatus further includes a sample scorer and miner to score the candidate samples with multi-task loss scores and select candidate samples with multi-task loss scores exceeding a threshold score.
Opening claim text (preview).
What is claimed is: 1. A method for performing object detection, the method comprising: generating, by executing a machine learning model using at least one processor, respective objectness scores for one or more regions of an image; selecting a first region of the one or more regions based on an objectness score in response to the first region meeting an objectness threshold; calculating a localization value for the first region; calculating a classification score for the first region; determining a multi-task loss score based on (a) the objectness score, (b) the localization value, and (c) the classification score, the multi-task loss score used to determine whether an object is contained in the first region of the image. 2. The method of claim 1 , further including generating an output detection result including the image and a bounding box representing the region including the detected object. 3. The method of claim 2 , wherein the bounding box is annotated with a classification of the detected object. 4. The method of claim 1 , further including upsampling the image to create the plurality of regions of the image. 5. The method of claim 4 , wherein the upsampling is performed using bi-linear interpolation. 6. The method of claim 1 , wherein the machine learning model includes a VGG-16 neural network. 7. An apparatus to detect an object in an image, the apparatus comprising: processor circuitry; and a storage device accessible by the processor circuitry, the storage device including machine readable instructions to cause the processor circuitry to: generate, using a machine learning model, respective objectness scores for one or more regions of the image; select a first region of the one or more regions based on an objectness score in response to the first region meeting an objectness threshold; calculate a localization value for the first region; calculate a classification score for the first region; determine a multi-task loss score based on (a) the objectness score, (b) the localization value, and (c) the classification score, the multi-task loss score used to determine whether an object is contained in the first region of the image. 8. The apparatus of claim 7 , wherein the processor is to generate an output detection result including the image and a bounding box representing the region including the detected object. 9. The apparatus of claim 8 , wherein the bounding box is annotated with a classification of the detected object. 10. The apparatus of claim 7 , wherein the processor is to upsample the image to create the plurality of regions of the image. 11. The apparatus of claim 10 , wherein the processor is to upsample the image using bi-linear interpolation. 12. The apparatus of claim 7 , wherein the machine learning model includes a VGG-16 neural network. 13. At least one non-transitory computer readable storage medium comprising instructions that, when executed, cause at least one processor to at least: generate, using a machine learning model, respective objectness scores for one or more regions of the image; select a first region of the one or more regions based on an objectness score in response to the first region meeting an objectness threshold; calculate a localization value for the first region; calculate a classification score for the first region; determine a multi-task loss score based on (a) the objectness score, (b) the localization value, and (c) the classification score, the multi-task loss score used to determine whether an object is contained in the first region of the image. 14. The at least one non-transitory computer readable storage medium of claim 13 , wherein the instructions cause the processor to generate an output detection result including the image and a bounding box representing the region including the detected object. 15. The at least one non-transitory computer readable storage medium of claim 14 , wherein the bounding box is annotated with a classification of the detected object. 16. The at least one non-transitory computer readable storage medium of claim 13 , wherein the instructions cause the processor to upsample the image to create the plurality of regions of the image. 17. The at least one non-transitory computer readable storage medium of claim 16 , wherein the instructions cause the processor to upsample the image using bi-linear interpolation. 18. The at least one non-transitory computer readable storage medium of claim 13 , wherein the machine learning model includes a VGG-16 neural network.
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.