Utilizing interactive deep learning to select objects in digital visual media

US11568627B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11568627-B2
Application numberUS-201916376704-A
CountryUS
Kind codeB2
Filing dateApr 5, 2019
Priority dateNov 18, 2015
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for selecting target objects within digital images utilizing a multi-modal object selection neural network trained to accommodate multiple input modalities. In particular, in one or more embodiments, the disclosed systems and methods generate a trained neural network based on training digital images and training indicators corresponding to various input modalities. Moreover, one or more embodiments of the disclosed systems and methods utilize a trained neural network and iterative user inputs corresponding to different input modalities to select target objects in digital images. Specifically, the disclosed systems and methods can transform user inputs into distance maps that can be utilized in conjunction with color channels and a trained neural network to identify pixels that reflect the target object.

First claim

Opening claim text (preview).

We claim: 1. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computer system to: identify, for a single digital image, a first user input corresponding to a first user input modality, the first user input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality; identify, for the single digital image, a second user input corresponding to a second user input modality, the second user input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second user input modality differs from the first user input modality; generate a first neural network input from the first user input for the single digital image corresponding to the first user input modality and a second neural network input from the second user input for the single digital image corresponding to the second user input modality; and generate an object segmentation from the single digital image by utilizing a first input channel of a multi-modal object selection neural network to analyze the first neural network input for the single digital image corresponding to the first user input modality and a second input channel of the multi-modal object selection neural network to analyze the second neural network input for the single digital image corresponding to the second user input modality. 2. The non-transitory computer-readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: generate the first neural network input from the first user input by generating a first distance map reflecting distances between pixels of the single digital image and the first user input corresponding to the first user input modality; and generate the second neural network input from the second user input by generating a second distance map reflecting distances between the pixels of the single digital image and the second user input corresponding to the second user input modality. 3. The non-transitory computer-readable medium of claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: generate a third neural network input from colors of the single digital image; and generate the object segmentation from the single digital image by utilizing a third input channel of the multi-modal object selection neural network to analyze third neural network input from the colors of the single digital image. 4. The non-transitory computer-readable medium of claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computer system to provide for display, via a user interface, a plurality of input modality selectable elements comprising at least one regional input modality element and at least one boundary input modality element. 5. The non-transitory computer-readable medium of claim 4 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: identify the first user input corresponding to the regional input modality by identifying a first user interaction with the at least one regional input modality element and a first selection of a pixel within the single digital image; and identifying the second user input corresponding to the boundary input modality by identifying a second user interaction with the at least one boundary input modality element and a second selection of a pixel within the single digital image. 6. The non-transitory computer-readable medium of claim 1 , wherein: the first user input modality comprises a regional input modality; the first user input indicates a first position relative to a target object portrayed in the single digital image; the second user input modality comprises a boundary input modality; and the second user input indicates a second position relative to the target object portrayed in the single digital image. 7. The non-transitory computer-readable medium of claim 6 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: utilize the multi-modal object selection neural network to generate an initial object segmentation based on the first user input corresponding to the regional input modality; and provide the initial object segmentation for display with the single digital image. 8. The non-transitory computer-readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: in response to identifying the second user input corresponding to the boundary input modality, utilize the multi-modal object selection neural network to generate the object segmentation based on the first user input corresponding to the regional input modality and the second user input corresponding to the boundary input modality; and provide the object segmentation for display with the single digital image. 9. The non-transitory computer-readable medium of claim 6 , further comprising instructions that, when executed by the at least one processor, cause the computer system to: identify a third user input corresponding to a language input modality; and utilize the multi-modal object selection neural network to generate the object segmentation based on the first user input corresponding to the regional input modality, the second user input corresponding to the boundary input modality; and the third user input corresponding to the language input modality. 10. A computer-implemented method comprising: identifying, for a single digital image, a first user input corresponding to a first user input modality, the first user input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality; identifying, for the single digital image, a second user input corresponding to a second user input modality, the second user input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second user input modality differs from the first user input modality; generating a first neural network input from the first user input for the single digital image corresponding to the first user input modality and a second neural network input from the second user input for the single digital image corresponding to the second user input modality; and generating an object segmentation from the single digital image by utilizing a first input channel of a multi-modal object selection neural network to analyze the first neural network input for the single digital image corresponding to the first user input modality and a second input channel of the multi-modal object selection neural network to analyze the second neural network input for the single digital image corresponding to the second user input modality. 11. The computer-implemented method of claim 10 , further comprising: generating the first neural network input from the first user input by generating a first distance map reflecting distances between pixels of the single digital image and the first user input corresponding to the first user input modality; and generating the second neural network input from the second user input by generating a second distance map reflecting distances between the pixels of the single digital image and the second user input corresponding to the second user input modality.

Assignees

Inventors

Classifications

  • by interactive preprocessing or interactive shape modelling, e.g. feature points assigned by a user · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • G06V10/255Primary

    Detecting or recognising potential candidate objects based on visual cues, e.g. shapes · CPC title

  • Region-based segmentation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568627B2 cover?
Systems and methods are disclosed for selecting target objects within digital images utilizing a multi-modal object selection neural network trained to accommodate multiple input modalities. In particular, in one or more embodiments, the disclosed systems and methods generate a trained neural network based on training digital images and training indicators corresponding to various input modalit…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/255. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).