Systems and methods for semi-supervised depth estimation according to an arbitrary camera
US-11436743-B2 · Sep 6, 2022 · US
US11908036B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11908036-B2 |
| Application number | US-202017034467-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 28, 2020 |
| Priority date | Sep 28, 2020 |
| Publication date | Feb 20, 2024 |
| Grant date | Feb 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The technology described herein is directed to a cross-domain training framework that iteratively trains a domain adaptive refinement agent to refine low quality real-world image acquisition data, e.g., depth maps, when accompanied by corresponding conditional data from other modalities, such as the underlying images or video from which the image acquisition data is computed. The cross-domain training framework includes a shared cross-domain encoder and two conditional decoder branch networks, e.g., a synthetic conditional depth prediction branch network and a real conditional depth prediction branch network. The shared cross-domain encoder converts synthetic and real-world image acquisition data into synthetic and real compact feature representations, respectively. The synthetic and real conditional decoder branch networks convert the respective synthetic and real compact feature representations back to synthetic and real image acquisition data (refined versions) conditioned on data from the other modalities. The cross-domain training framework iteratively trains the domain adaptive refinement agent.
Opening claim text (preview).
What is claimed is: 1. A cross domain supervised learning-based system for iteratively training a domain adaptive refinement agent, the system comprising: a cross-domain encoder configured to convert synthetic and real image acquisition data derived from a video into compact synthetic and real feature representations respectively, wherein the synthetic and real feature image acquisition data comprises a frame of the video, the cross-domain encoder comprising at least four input channels, wherein three of the at least four input channels respectively comprise red (R), green (G), and blue (B) input channels for an RGB image corresponding to the frame of the video, and a fourth input channel comprises the real image acquisition data, wherein the real image acquisition data is a depth map corresponding to the RGB image; a synthetic conditional depth prediction branch network comprising a synthetic encoder and a synthetic decoder, and configured to convert the compact synthetic feature representation to a refined version of the synthetic image acquisition data, the synthetic encoder outputting a compact synthetic conditional depth feature representation, the compact synthetic conditional depth feature representation concatenated with the compact synthetic feature representation output from the cross-domain encoder for input to the synthetic decoder to generate the refined version of the synthetic image acquisition data; a real conditional depth prediction branch network configured to convert the compact real feature representation to a refined version of the real image acquisition data; and a training supervision element configured to iteratively train the domain adaptive refinement agent based on the refined versions of the synthetic image acquisition data from the synthetic conditional depth prediction branch network and the real image acquisition data from the real conditional depth prediction branch network. 2. The cross domain supervised learning-based system of claim 1 , wherein to iteratively train the domain adaptive refinement agent, the training supervision element is configured to: compare the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data; calculate a synthetic domain loss based on the comparison; feed the synthetic domain loss to the synthetic conditional depth prediction branch network to update parameters of the synthetic encoder and the synthetic decoder; and feed the synthetic domain loss to the cross-domain encoder to update parameters of the cross-domain encoder. 3. The cross domain supervised learning-based system of claim 1 , wherein to iteratively train the domain adaptive refinement agent, the training supervision element is configured to: compare the refined version of the real image acquisition data to the real image acquisition data; calculate a real domain loss based on the comparison; feed the real domain loss to the real conditional depth prediction branch network to update parameters of a real encoder and a real decoder; and feed the real domain loss to the cross-domain encoder to update parameters of the cross-domain encoder. 4. The cross domain supervised learning-based system of claim 1 , the synthetic conditional depth prediction branch network comprising: a synthetic concatenation element configured to concatenate the compact synthetic feature representation with the compact synthetic conditional depth feature representation from the synthetic encoder, wherein the compact synthetic feature representation is passed through the synthetic concatenation element to the synthetic decoder. 5. The cross domain supervised learning-based system of claim 1 , the real conditional depth prediction branch network comprising: a real concatenation element configured to concatenate the compact real feature representation with a real conditional depth feature. 6. The cross domain supervised learning-based system of claim 1 , wherein the synthetic conditional depth prediction branch network comprises a convolutional neural network with skip links that connect outputs of convolutional layers in the synthetic encoder to inputs of the convolutional layers of the synthetic decoder. 7. The cross domain supervised learning-based system of claim 1 , wherein the real conditional depth prediction branch network comprises a convolutional neural network with skip links that connect outputs of convolutional layers in a real encoder to inputs of the convolutional layers of a real decoder. 8. The cross domain supervised learning-based system of claim 1 , wherein the synthetic conditional depth prediction branch network is configured to limit the size of the compact synthetic feature representation, and wherein the real conditional depth prediction branch network is configured to limit the size of the compact real feature representation. 9. The cross domain supervised learning-based system of claim 1 , wherein the compact synthetic feature representation from the cross-domain encoder is passed through to the synthetic decoder without skip links connecting the cross-domain encoder with the synthetic decoder. 10. A method of refining image acquisition data through domain adaptation, the method comprising: converting, by a cross-domain encoder, real image acquisition data into a compact real feature representation, wherein the cross-domain encoder receives inputs from at least four input channels three of the at least four input channels respectively comprise red (R), green (G), and blue (B) input channels for RGB images from a video, and a fourth input channel comprises the real image acquisition data, wherein the real image acquisition data input to the fourth input channel comprises a depth map computed from an RGB image from the video, and RGB inputs to the RGB input channels comprise a first frame and a second frame that are each adjacent to the RGB image in the video; converting, by a real encoder, conditional real data into a conditional real depth feature; and generating, by a synthetic decoder, a refined version of the real image acquisition data based on the conditional real data, the compact real feature representation from the cross-domain encoder passed through to the synthetic decoder to generate the refined version of the real image acquisition data, wherein the refined version of the real image acquisition data is generated by converting the compact real feature representation to the refined version of real image acquisition data conditioned on the conditional real depth feature. 11. The method of claim 10 , wherein passing the compact real feature representation from the cross-domain encoder through to the synthetic decoder includes: concatenating, by a real concatenation element, the compact real feature representation and the conditional real depth feature resulting in a concatenated feature vector; and feeding the concatenated feature vector to the synthetic decoder. 12. The method of claim 10 , wherein the conditional real data is of a different modality and of a higher quality than the corresponding real image acquisition data, and wherein the different modality comprises images or video frames. 13. The method of claim 10 , wherein the compact real feature representation from the cross-domain encoder is passed through to the synthetic decoder without skip links connecting the cross-domain encoder with the synthetic decoder. 14. A supervised learning-based method of iteratively training a domain adaptive refinement agent, the method comprising: feeding synthetic image acquisition data and real image acquisition data derived from a video to a cross-domai
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Image feed-back for automatic industrial control, e.g. robot with camera (robots B25J19/023) · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.