Neural network for object detection in images
US-11645834-B2 · May 9, 2023 · US
US12444168B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12444168-B2 |
| Application number | US-201917622462-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 5, 2019 |
| Priority date | Aug 5, 2019 |
| Publication date | Oct 14, 2025 |
| Grant date | Oct 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computing system for detecting objects in an image can perform operations including generating an image pyramid that includes a first level corresponding with the image at a first resolution and a second level corresponding with the image at a second resolution. The operations can include tiling the first level and the second level by dividing the first level into a first plurality of tiles and the second level into a second plurality of tiles; inputting the first plurality of tiles and the second plurality of tiles into a machine-learned object detection model; receiving, as an output of the machine-learned object detection model, object detection data that includes bounding boxes respectively defined with respect to individual ones of the first plurality of tiles and the second plurality of tiles; and generating image object detection output by mapping the object detection data onto an image space of the image.
Opening claim text (preview).
What is claimed is: 1. A computing system comprising: at least one processor; a preliminary machine-learned object detection model configured to receive an image, and, in response to receipt of the image, output an intermediate feature representation; a machine-learned object detection model configured to receive a plurality of tiles, and, in response to receipt of the plurality of tiles, output object detection data for the plurality of tiles, the object detection data comprising a plurality of bounding boxes respectively defined with respect to individual ones of the plurality of tiles; and at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: generating an image pyramid based on the image having an image space, the image pyramid comprising a first level corresponding with the image at a first resolution and a second level corresponding with the image at a second resolution that is different than the first resolution, wherein generating the image pyramid based on the image comprises: inputting the image into the preliminary machine-learned object detection model, the image being input as a plurality of preliminary tiles; receiving, as an output of the preliminary machine-learned object detection model, the intermediate feature representation, the intermediate feature representation corresponding with the plurality of preliminary tiles; and generating the first level and the second level of the image pyramid based on the intermediate feature representation; tiling the first level and the second level by dividing the first level into a first plurality of tiles and the second level into a second plurality of tiles; inputting the first plurality of tiles and the second plurality of tiles into the machine-learned object detection model; receiving, as an output of the machine-learned object detection model, the object detection data comprising the plurality of bounding boxes respectively defined with respect to individual ones of the first plurality of tiles and the second plurality of tiles; and generating an image object detection output by mapping the object detection data onto the image space of the image. 2. The computing system of claim 1 , wherein the operations further comprise: identifying at least one bounding box of the image object detection output based on the at least one bounding box intersecting a border of one or more of the first plurality of tiles or second plurality of tiles; and removing the at least one bounding box from the image object detection output. 3. The computing system of claim 2 , wherein the at least one bounding box is identified based on the at least one bounding box spanning across the one or more of the first plurality of tiles or the second plurality of tiles such that the at least one bounding box intersects the border and an opposite border of the one or more of the first plurality of tiles or the second plurality of tiles that is parallel with the border. 4. The computing system of claim 2 , wherein the at least one bounding box is identified based on the at least one bounding box intersecting the border of the one or more of the first plurality of tiles or the second plurality of tiles and intersecting an edge of the respective level of the image pyramid. 5. The computing system of claim 2 , wherein removing the at least bounding box from the image object detection output comprises removing each bounding box that intersects any of a plurality of borders of the first plurality of tiles or second plurality of tiles. 6. A method for training a machine learned object detection model, the method comprising: for each training image of a plurality of training images: generating, by one or more computing devices, an image pyramid based on the respective training image having a respective image space, the image pyramid comprising a first level corresponding with the respective training image at a first resolution and a second level corresponding with the respective training image at a second resolution that is different than the first resolution, wherein generating the image pyramid based on the training image comprises: inputting the training image into a preliminary machine-learned object detection model, the training image being input as a plurality of preliminary tiles; receiving, as an output of the preliminary machine-learned object detection model, an intermediate feature representation, the intermediate feature representation corresponding to the plurality of preliminary tiles; and generating the first level and the second level of the image pyramid based on the intermediate feature representation corresponding; tiling, by the one or more computing devices, the first level and the second level by dividing the first level into a first plurality of tiles and the second level into a second plurality of tiles; inputting, by the one or more computing devices, the first plurality of tiles and the second plurality of tiles into a machine-learned object detection model; receiving, by the one or more computing devices and as an output of the machine-learned object detection model, object detection data comprising the plurality of bounding boxes respectively defined with respect to individual ones of the first plurality of tiles and the second plurality of tiles; generating, by the one or more computing devices, an image object detection output by mapping the object detection data onto the respective image space of the respective training image; and adjusting, by the one or more computing devices, parameters of the machine-learned object detection model based on a comparison of the image object detection output with ground truth object location data that corresponds to the respective training image of the plurality of training images. 7. The method of claim 6 , further comprising: identifying, by the one or more computing devices, at least one bounding box of the image object detection output based on the at least one bounding box intersecting a border of one or more of the first plurality of tiles or second plurality of tiles; and removing, by the one or more computing devices, the at least bounding box from the image object detection output. 8. The method of claim 7 , wherein the at least one bounding box is identified, by the one or more computing devices, based on the at least one bounding box spanning across the one or more of the first plurality of tiles or the second plurality of tiles such that the at least one bounding bod intersects the border and an opposite border of the one or more of the first plurality of tiles or the second plurality of tiles that is parallel with the border. 9. The method of claim 7 , wherein the at least one bounding box is identified, by the one or more computing devices, based on the at least one bounding box intersecting the border of the one or more of the first plurality of tiles or the second plurality of tiles and intersecting an edge of the respective level of the image pyramid. 10. The method of claim 7 , wherein removing, by the one or more computing devices, the at least bounding box from the image object detection output comprises removing, by the one or more computing devices, each bounding box that intersects any of a plurality of borders of the first plurality of tiles or second plurality of tiles. 11. The method of claim 6 , wherein: inputting the training image into the preliminary machine-learned object detection model comprises: tiling the respective training image into the plurality of preliminary tiles; inputting the pl
Region-based matching · CPC title
Active pattern-learning, e.g. online learning of image or video features · CPC title
using rules for classification or partitioning the feature space · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.