Utilizing machine learning models to generate refined depth maps with segmentation mask guidance

US12367585B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12367585-B2
Application numberUS-202217658873-A
CountryUS
Kind codeB2
Filing dateApr 12, 2022
Priority dateApr 12, 2022
Publication dateJul 22, 2025
Grant dateJul 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing machine learning models to generate refined depth maps of digital images utilizing digital segmentation masks. In particular, in one or more embodiments, the disclosed systems generate a depth map for a digital image utilizing a depth estimation machine learning model, determine a digital segmentation mask for the digital image, and generate a refined depth map from the depth map and the digital segmentation mask utilizing a depth refinement machine learning model. In some embodiments, the disclosed systems generate first and second intermediate depth maps using the digital segmentation mask and an inverse digital segmentation mask and merger the first and second intermediate depth maps to generate the refined depth map.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable medium storing instructions thereon that, when executed by at least one processor, cause the at least one processor to perform operations comprising: generating a depth map for a digital image utilizing a depth estimation machine learning model; determining a digital segmentation mask for the digital image, the digital segmentation mask indicating one or more objects portrayed in the digital image; generating a first intermediate depth map for the digital image, utilizing a depth refinement machine learning model, from the depth map and the digital segmentation mask for the digital image; generating a second intermediate depth map for the digital image, utilizing the depth refinement machine learning model, from the depth map and an inverse of the digital segmentation mask for the digital image; and combining the first intermediate depth map and the second intermediate depth map to generate a refined depth map for the digital image. 2. The non-transitory computer-readable medium of claim 1 , wherein generating the refined depth map comprises refining the depth map along one or more boundaries indicated by the digital segmentation mask. 3. The non-transitory computer-readable medium of claim 1 , wherein generating the refined depth map comprises: generating a first intermediate depth map utilizing the depth refinement machine learning model from the depth map and the digital segmentation mask; and generating a second intermediate depth map utilizing the depth refinement machine learning model from the depth map and an inverse digital segmentation mask. 4. The non-transitory computer-readable medium of claim 3 , wherein generating the refined depth map comprises merging the first intermediate depth map and the second intermediate depth map to generate the refined depth map. 5. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the at least one processor to determine the digital segmentation mask utilizing an image segmentation machine learning model. 6. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed the at least one processor, cause the at least one processor to utilize the refined depth map to generate a modified digital image from the digital image. 7. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising modifying, in response to generating the refined depth map, one or more parameters of the depth refinement machine learning model based on comparing the refined depth map with a ground truth depth map. 8. The non-transitory computer-readable medium of claim 3 , further comprising instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising: modifying parameters of the depth refinement machine learning model based on comparing the first intermediate depth map with a first ground truth depth map and comparing the second intermediate depth map with a second ground truth depth map. 9. The non-transitory computer-readable medium of claim 4 , further comprising instructions that, when executed by the at least one processor, cause the at least one processor to modify one or more parameters of the depth refinement machine learning model based on comparing the refined depth map with a composite ground truth depth map. 10. The non-transitory computer-readable medium of claim 9 , further comprising instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising generating the composite ground truth depth map by combining of a first ground truth depth map corresponding to the digital segmentation mask and a second ground truth depth map corresponding to the inverse digital segmentation mask. 11. A system comprising: one or more memory devices; and one or more processors coupled to the one or more memory devices that cause the system to perform operations comprising: generating, utilizing a map refinement neural network, a first intermediate environment map of a digital image from an initial environment map of the digital image and a digital segmentation mask for the digital image; generating, utilizing the map refinement neural network, a second intermediate environment map of the digital image from the initial environment map of the digital image and an inverse of the digital segmentation mask for the digital image; and merging the first intermediate environment map and the second intermediate environment map to determine a refined environment map for the digital image. 12. The system of claim 11 , wherein the one or more processors are further configured to cause the system to perform operations comprising: generate the first intermediate environment map by generating at least one of a refined depth map, a refined normal map, a refined semantic segmentation map, a refined optical flow map, a refined image contrast map, or a refined infrared map; and generate the refined environment map by generating at least one of a refined depth map, a refined normal map, a refined semantic segmentation map, a refined optical flow map, a refined image contrast map, or a refined infrared map. 13. The system of claim 11 , wherein the digital image comprises a composite digital image, and wherein the one or more processors are further configured to cause the system to perform operations comprising: generating the first intermediate environment map and the second intermediate environment map by generating a first intermediate depth map and a second intermediate depth map utilizing a depth refinement neural network; extracting an image excerpt of a first digital image and a depth map excerpt of a first ground truth depth map of the first digital image based on the digital segmentation mask; and combining the image excerpt with a second digital image to generate the composite digital image. 14. The system of claim 13 , wherein the one or more processors are further configured to cause the system to perform operations comprising: modifying, in response to generating the first intermediate depth map, one or more parameters of the depth refinement neural network to reduce a measure of loss between the first intermediate depth map and the first ground truth depth map; or modifying, in response to generating the second intermediate depth map, one or more parameters of the depth refinement neural network to reduce a measure of loss between the second intermediate depth map and a second ground truth depth map of the second digital image. 15. The system of claim 13 , wherein the one or more processors are further configured to cause the system to perform operations comprising combining the depth map excerpt with a second ground truth depth map of the second digital image to generate a composite depth map of the composite digital image. 16. The system of claim 15 , wherein the initial environment map comprises a perturbed depth map, and wherein the one or more processors are further configured to cause the system to perform operations comprising: altering the composite depth map with one or more perturbations to generate the perturbed depth map; and modifying one or more parameters of the depth refinement neural network to reduce a measure of loss between the first intermediate depth map and the first ground truth depth ma

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12367585B2 cover?
The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing machine learning models to generate refined depth maps of digital images utilizing digital segmentation masks. In particular, in one or more embodiments, the disclosed systems generate a depth map for a digital image utilizing a depth estimation machine learning model, determine a digita…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/50. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).