Generating descriptions of image relationships

US11195048B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11195048-B2
Application numberUS-202016750478-A
CountryUS
Kind codeB2
Filing dateJan 23, 2020
Priority dateJan 23, 2020
Publication dateDec 7, 2021
Grant dateDec 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In implementations of generating descriptions of image relationships, a computing device implements a description system which receives a source digital image and a target digital image. The description system generates a source feature sequence from the source digital image and a target feature sequence from the target digital image. A visual relationship between the source digital image and the target digital image is determined by using cross-attention between the source feature sequence and the target feature sequence. The system generates a description of a visual transformation between the source digital image and the target digital image based on the visual relationship.

First claim

Opening claim text (preview).

What is claimed is: 1. In a digital medium environment to generate a description of a visual transformation between a source digital image and a target digital image, a method implemented by a computing device, the method comprising: receiving, by the computing device, the source digital image and the target digital image; generating, by the computing device, a source feature sequence from the source digital image and a target feature sequence from the target digital image, features of the source feature sequence each represent a portion of the source digital image and features of the target feature sequence each represent a portion of the target digital image; determining, by the computing device, a visual relationship between the source digital image and the target digital image using cross-attention between the features of the source feature sequence and the features of the target feature sequence; and generating, by the computing device for display in a user interface, the description of the visual transformation based on the visual relationship as including a difference between an environment scene at a first point in time and the environment scene at a second point in time. 2. The method as described in claim 1 , wherein the description of the visual transformation is generated as text. 3. The method as described in claim 1 , wherein the visual transformation includes an image editing operation. 4. The method as described in claim 1 , further comprising captioning the source digital image and the target digital image with the description of the visual transformation. 5. The method as described in claim 1 , wherein the description of the visual transformation includes a natural language image editing instruction. 6. The method as described in claim 1 , wherein the description of the visual transformation includes a description of objects depicted in the source digital image or the target digital image. 7. The method as described in claim 1 , wherein the target digital image includes an object that is excluded from the source digital image. 8. The method as described in claim 1 , wherein the target digital image excludes an object that is included in the source digital image. 9. The method as described in claim 1 , wherein determining the visual relationship between the source digital image and the target digital image includes concatenating the source feature sequence and the target feature sequence into a single feature sequence. 10. In a digital medium environment to generate a description of a visual transformation between a source digital image and a target digital image, a system comprising: a feature module implemented at least partially in hardware of a computing device to: receive the source digital image and the target digital image; and generate a source feature sequence from the source digital image and a target feature sequence from the target digital image, features of the source feature sequence each represent a portion of the source digital image and features of the target feature sequence each represent a portion of the target digital image; a relationship module implemented at least partially in the hardware of the computing device to determine a visual relationship between the source digital image and the target digital image using cross-attention between the features of the source feature sequence and the features of the target feature sequence; and a rendering module implemented at least partially in the hardware of the computing device to generate, for display in a user interface of a display device, the description of the visual transformation based on the visual relationship as including a difference between an environment scene at a first point in time and the environment scene at a second point in time. 11. The system as described in claim 10 , wherein the visual transformation includes an image editing operation. 12. The system as described in claim 10 , wherein the description of the visual transformation includes a natural language image editing instruction. 13. The system as described in claim 10 , wherein the description of the visual transformation includes a description of an object depicted in the source digital image or the target digital image. 14. The system as described in claim 10 , wherein the relationship module includes a Long Short-Term Memory (LSTM) decoder. 15. One or more non-transitory computer-readable storage media comprising instructions stored thereon that, responsive to execution by a computing device in a digital medium environment to generate a description of a visual transformation between a source digital image and a target digital image, cause operations of the computing device including: generating a source feature sequence from a source feature map extracted from the source digital image and generating a target feature sequence from a target feature map extracted from the target digital image, features of the source feature sequence each represent a portion of the source digital image and features of the target feature sequence each represent a portion of the target digital image; determining a visual relationship between the source digital image and the target digital image using cross-attention between the features of the source feature sequence and the features of the target feature sequence and cross-attention between the features of the target feature sequence and the features of the source feature sequence; and generating, for display in a user interface of a display device, the description of the visual transformation based on the visual relationship as including a difference between an environment scene at a first point in time and the environment scene at a second point in time. 16. The one or more non-transitory computer-readable storage media as described in claim 15 , the operations of the computing device further including captioning the source digital image and the target digital image with the description of the visual transformation. 17. The one or more non-transitory computer-readable storage media as described in claim 15 , wherein the description of the visual transformation includes an image editing instruction. 18. The one or more non-transitory computer-readable storage media as described in claim 15 , wherein the description of the visual transformation includes a description of an object depicted in the source digital image or the target digital image. 19. The one or more non-transitory computer-readable storage media as described in claim 15 , wherein the target digital image includes an object that is excluded from the source digital image. 20. The one or more non-transitory computer-readable storage media as described in claim 15 , wherein the target digital image excludes an object that is included in the source digital image.

Assignees

Inventors

Classifications

  • G06F40/56Primary

    Natural language generation · CPC title

  • in augmented reality scenes · CPC title

  • using neural networks · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11195048B2 cover?
In implementations of generating descriptions of image relationships, a computing device implements a description system which receives a source digital image and a target digital image. The description system generates a source feature sequence from the source digital image and a target feature sequence from the target digital image. A visual relationship between the source digital image and t…
Who is the assignee on this patent?
Adobe Inc, Univ North Carolina Chapel Hill
What technology area does this patent fall under?
Primary CPC classification G06F40/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).