3d object recognition using 3d convolutional neural network with depth based multi-scale filters
US-2021209339-A1 · Jul 8, 2021 · US
US12462453B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12462453-B2 |
| Application number | US-202217585449-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 26, 2022 |
| Priority date | Sep 4, 2018 |
| Publication date | Nov 4, 2025 |
| Grant date | Nov 4, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of a method includes applying a first generator model to a semantic representation of an image to generate an affine transformation, where the affine transformation represents a bounding box associated with at least one region within the image. The method further includes applying a second generator model to the affine transformation and the semantic representation to generate a shape of an object. The method further includes inserting the object into the image based on the bounding box and the shape.
Opening claim text (preview).
What is claimed is: 1 . One or more processors, comprising: circuitry to use one or more neural networks to: identify a set of locations in an image into which to insert an object based, at least in part, on a segmentation map and a location of one or more other objects within the image; identify a shape of the object based, at least in part, on the set of locations; and insert the object into at least one of the set of locations based, at least in part, on the identified shape. 2 . The one or more processors of claim 1 , wherein the circuitry is further to: use the one or more neural networks to generate a transformation corresponding to a bounding box associated with at least one of the set of locations; and insert the object into the image based, at least in part, on the transformation. 3 . The one or more processors of claim 2 , wherein the circuitry is further to apply the transformation to the identified shape. 4 . The one or more processors of claim 1 , wherein the segmentation map comprises a first representation of the image classifying one or more pixels in the image. 5 . The one or more processors of claim 1 , wherein the circuitry is further to: calculate a region of pixels in the image based on the identified shape. 6 . The one or more processors of claim 1 , wherein the one or more neural networks include one or more generator models. 7 . A system, comprising: one or more computers having one or more processors to use one or more neural networks to: identify a set of locations in an image into which to insert an object based, at least in part, on a segmentation map and a location of one or more other objects within the image; identify a shape of the object based, at least in part, on the set of locations; and insert the object into at least one of the set of locations based, at least in part, on the identified shape. 8 . The system of claim 7 , wherein the one or more processors are further to: obtain the segmentation map comprising a representation of the image; process the representation using a first neural network of the one or more neural networks to generate a transformation corresponding to the at least one of the set of locations; and process the transformation and the representation using a second neural network of the one or more neural networks to insert the object. 9 . The system of claim 8 , wherein the representation indicates classifications of pixels of the image. 10 . The system of claim 7 , wherein inserting the object is further based, at least in part, on a size of the object. 11 . The system of claim 7 , wherein the one or more processors are further to identify the at least one of the set of locations through one or more affine transformations, and wherein the one or more affine transformations include at least one of: a translation, scaling, and rotation. 12 . The system of claim 7 , wherein the one or more neural networks are to insert the object based, at least in part, on one or more semantic contexts of the one or more other objects. 13 . A machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to use one or more neural networks to: identify a set of locations in an image into which to insert an object based, at least in part, on a segmentation map and a location of one or more other objects within the image; identify a shape of the object based, at least in part, on the set of locations; and insert the object into at least one of the set of locations based, at least in part, on the identified shape. 14 . The machine-readable medium of claim 13 , wherein the set of instructions further include instructions, which if performed by the one or more processors, cause the one or more processors to: use the one or more neural networks to process the segmentation map to generate a transformation corresponding to the at least one of the set of locations; and process the transformation using the one or more neural networks to identify the shape of the object. 15 . The machine-readable medium of claim 14 , wherein the segmentation map comprises labels of pixels of the image. 16 . The machine-readable medium of claim 13 , wherein identifying the set of locations is further based, at least in part, on a random vector. 17 . The machine-readable medium of claim 13 , wherein the set of instructions further include instructions, which if performed by the one or more processors, cause the one or more processors to insert the object by at least updating a representation of the image to indicate a label for the object. 18 . The machine-readable medium of claim 13 , wherein the one or more neural networks include a variational autoencoder (VAE). 19 . One or more processors, comprising: circuitry to train one or more neural networks to; identify a set of locations in an image into which to insert an object based, at least in part, on a segmentation map and a location of one or more other objects within the image; identify a shape of the object based, at least in part, on the set of locations; and insert the object into at least one of the set of locations based, at least in part, on the identified shape. 20 . The one or more processors of claim 19 , wherein the circuitry is further to: cause the one or more neural networks to predict a transformation representing a bounding box corresponding to the at least one of the set of locations; and train the one or more neural networks based, at least in part, on the predicted transformation. 21 . The one or more processors of claim 20 , wherein the circuitry is further to update parameters of the one or more neural networks using one or more supervised loss functions based on the predicted transformation and ground truth data. 22 . The one or more processors of claim 19 , wherein the one or more neural networks include one or more discriminator models. 23 . The one or more processors of claim 19 , wherein the circuitry is further to train the one or more neural networks using one or more adversarial loss functions. 24 . The one or more processors of claim 19 , wherein the image is a training image for an autonomous driving system. 25 . A method, comprising: training one or more neural networks to; identify a set of locations in an image into which to insert an object based, at least in part, on a segmentation map and a location of one or more other objects within the image; identify a shape of the object based, at least in part, on the set of locations; and insert the object into at least one of the set of locations based, at least in part, on the identified shape. 26 . The method of claim 25 , further comprising: processing the identified shape using one or more discriminators; and training the one or more neural networks based, at least in part, on outputs from the one or more discriminators. 27 . The method of claim 26 , further comprising processing the identified shape using the one or more discriminators by at least causing the one or more discriminators to classify the identified shape as real or fake. 28 . The method of claim 25 , further comprising training the one or more neural networks using one or more unsupervised loss functions. 29 . The method of claim 25 , wher
Affine transformations (for image registration G06T3/147; for image mosaicing G06T3/4038) · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Classification techniques · CPC title
Bounding box · CPC title
Artificial neural networks [ANN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.