Utilizing deep learning for boundary-aware image segmentation
US-2017287137-A1 · Oct 5, 2017 · US
US11568627B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11568627-B2 |
| Application number | US-201916376704-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 5, 2019 |
| Priority date | Nov 18, 2015 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are disclosed for selecting target objects within digital images utilizing a multi-modal object selection neural network trained to accommodate multiple input modalities. In particular, in one or more embodiments, the disclosed systems and methods generate a trained neural network based on training digital images and training indicators corresponding to various input modalities. Moreover, one or more embodiments of the disclosed systems and methods utilize a trained neural network and iterative user inputs corresponding to different input modalities to select target objects in digital images. Specifically, the disclosed systems and methods can transform user inputs into distance maps that can be utilized in conjunction with color channels and a trained neural network to identify pixels that reflect the target object.
Opening claim text (preview).
We claim: 1. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computer system to: identify, for a single digital image, a first user input corresponding to a first user input modality, the first user input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality; identify, for the single digital image, a second user input corresponding to a second user input modality, the second user input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second user input modality differs from the first user input modality; generate a first neural network input from the first user input for the single digital image corresponding to the first user input modality and a second neural network input from the second user input for the single digital image corresponding to the second user input modality; and generate an object segmentation from the single digital image by utilizing a first input channel of a multi-modal object selection neural network to analyze the first neural network input for the single digital image corresponding to the first user input modality and a second input channel of the multi-modal object selection neural network to analyze the second neural network input for the single digital image corresponding to the second user input modality. 2. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: generate the first neural network input from the first user input by generating a first distance map reflecting distances between pixels of the single digital image and the first user input corresponding to the first user input modality; and generate the second neural network input from the second user input by generating a second distance map reflecting distances between the pixels of the single digital image and the second user input corresponding to the second user input modality. 3. The non-transitory computer-readable medium of claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: generate a third neural network input from colors of the single digital image; and generate the object segmentation from the single digital image by utilizing a third input channel of the multi-modal object selection neural network to analyze third neural network input from the colors of the single digital image. 4. The non-transitory computer-readable medium of claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computer system to provide for display, via a user interface, a plurality of input modality selectable elements comprising at least one regional input modality element and at least one boundary input modality element. 5. The non-transitory computer-readable medium of claim 4 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: identify the first user input corresponding to the regional input modality by identifying a first user interaction with the at least one regional input modality element and a first selection of a pixel within the single digital image; and identifying the second user input corresponding to the boundary input modality by identifying a second user interaction with the at least one boundary input modality element and a second selection of a pixel within the single digital image. 6. The non-transitory computer-readable medium of claim 1 , wherein: the first user input modality comprises a regional input modality; the first user input indicates a first position relative to a target object portrayed in the single digital image; the second user input modality comprises a boundary input modality; and the second user input indicates a second position relative to the target object portrayed in the single digital image. 7. The non-transitory computer-readable medium of claim 6 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: utilize the multi-modal object selection neural network to generate an initial object segmentation based on the first user input corresponding to the regional input modality; and provide the initial object segmentation for display with the single digital image. 8. The non-transitory computer-readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: in response to identifying the second user input corresponding to the boundary input modality, utilize the multi-modal object selection neural network to generate the object segmentation based on the first user input corresponding to the regional input modality and the second user input corresponding to the boundary input modality; and provide the object segmentation for display with the single digital image. 9. The non-transitory computer-readable medium of claim 6 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: identify a third user input corresponding to a language input modality; and utilize the multi-modal object selection neural network to generate the object segmentation based on the first user input corresponding to the regional input modality, the second user input corresponding to the boundary input modality; and the third user input corresponding to the language input modality. 10. A computer-implemented method comprising: identifying, for a single digital image, a first user input corresponding to a first user input modality, the first user input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality; identifying, for the single digital image, a second user input corresponding to a second user input modality, the second user input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second user input modality differs from the first user input modality; generating a first neural network input from the first user input for the single digital image corresponding to the first user input modality and a second neural network input from the second user input for the single digital image corresponding to the second user input modality; and generating an object segmentation from the single digital image by utilizing a first input channel of a multi-modal object selection neural network to analyze the first neural network input for the single digital image corresponding to the first user input modality and a second input channel of the multi-modal object selection neural network to analyze the second neural network input for the single digital image corresponding to the second user input modality. 11. The computer-implemented method of claim 10 , further comprising: generating the first neural network input from the first user input by generating a first distance map reflecting distances between pixels of the single digital image and the first user input corresponding to the first user input modality; and generating the second neural network input from the second user input by generating a second distance map reflecting distances between the pixels of the single digital image and the second user input corresponding to the second user input modality.
by interactive preprocessing or interactive shape modelling, e.g. feature points assigned by a user · CPC title
Physics · mapped topic
Physics · mapped topic
Detecting or recognising potential candidate objects based on visual cues, e.g. shapes · CPC title
Region-based segmentation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.