Generating ground truth annotations corresponding to digital image editing dialogues for training state tracking models

US11100917B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11100917-B2
Application numberUS-201916366904-A
CountryUS
Kind codeB2
Filing dateMar 27, 2019
Priority dateMar 27, 2019
Publication dateAug 24, 2021
Grant dateAug 24, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, non-transitory computer-readable media, and methods that generate ground truth annotations of target utterances in digital image editing dialogues in order to create a state-driven training data set. In particular, in one or more embodiments, the disclosed systems utilize machine and user defined tags, machine learning model predictions, and user input to generate a ground truth annotation that includes frame information in addition to intent, attribute, object, and/or location information. In at least one embodiment, the disclosed systems generate ground truth annotations in conformance with an annotation ontology that results in fast and accurate digital image editing dialogue annotation.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method comprising: in response to a detected user selection of an utterance within a digital image editing dialogue, identifying the utterance as a target utterance from the digital image editing dialogue, the digital image editing dialogue comprising digital verbal communications from a user for editing a digital image; generating a ground truth annotation for the target utterance of the digital image editing dialogue by: providing, for display via an image editing dialogue annotation user interface, a plurality of image editing annotation elements and the target utterance; based on user interactions with the plurality of image editing annotation elements, determining a frame identifier reflecting a shared topic between the target utterance and other digital verbal communications in the digital image editing dialogue, and a ground truth image editing intent corresponding to the target utterance; determining a speaker associated with the target utterance; and based on the speaker associated with the target utterance, generating the ground truth annotation for the target utterance including: the frame identifier reflecting the shared topic between the target utterance and other digital verbal communications in the digital image editing dialogue, and the ground truth image editing intent; and adding the ground truth annotation and the target utterance of the digital image dialogue to a training data set for training a digital image editing dialogue machine learning model to learn co-reference resolution and user intent tracking over multiple conversational turns in a new digital image editing dialogue. 2. The computer-implemented method as recited in claim 1 , wherein the ground truth annotation of the target utterance additionally comprises at least one of: a location identifier relative to the digital image, or an object identifier relative to the digital image. 3. The computer-implemented method as recited in claim 1 , wherein the training data set comprises a second ground truth annotation corresponding to a second target utterance from the digital image editing dialogue, the second ground truth annotation comprising the frame identifier. 4. The computer-implemented method as recited in claim 3 , wherein the frame identifier reflects the shared topic by comprising at least one of: a common ground truth intent corresponding to both the target utterance and the second target utterance, a common object identifier corresponding to both the target utterance and the second target utterance, or a common location identifier corresponding to both the target utterance and the second target utterance. 5. The computer-implemented method as recited in claim 1 , wherein the training data set comprises multiple frames for multiple target utterances of the digital image editing dialogue, and each frame corresponds to a unique ground truth intent, a unique object identifier, or a unique location identifier. 6. A non-transitory computer-readable storage medium storing instructions thereon that, when executed by at least one processor, cause a system to: in response to a detected selection of an utterance within a digital image editing dialogue within an image editing dialogue annotation user interface, identify the utterance as a target utterance from the digital image editing dialogue, the digital image editing dialogue comprising digital verbal communications from a user for editing a digital image; generate a ground truth annotation for the target utterance of the digital image editing dialogue by: providing, for display via the image editing dialogue annotation user interface, a plurality of image editing annotation elements and the target utterance; based on user interactions with the plurality of image editing annotation elements, determining a frame identifier reflecting a shared topic between the target utterance and other digital verbal communications in the digital image editing dialogue, and a ground truth image editing intent corresponding to the target utterance; determining a speaker associated with the target utterance; and based on the speaker associated with the target utterance, generating the ground truth annotation for the target utterance including: the frame identifier reflecting the shared topic between the target utterance and other digital verbal communications in the digital image editing dialogue, and the ground truth image editing intent; and add the target utterance and the ground truth annotation to a training data set for training an image editing dialogue machine learning model to learn co-reference resolution and user intent tracking over multiple conversational turns in a new digital image editing dialogue. 7. The non-transitory computer-readable storage medium as recited in claim 6 , wherein the ground truth annotation comprises a plurality of values corresponding to an annotation ontology, the annotation ontology comprising ontology slots. 8. The non-transitory computer-readable storage medium as recited in claim 7 , wherein the ontology slots comprise a pre-defined ontology slot that accepts pre-defined canonical forms and an open-ended ontology slot that accept open-ended values. 9. The non-transitory computer-readable storage medium as recited in claim 8 , further storing instructions thereon that, when executed by the at least one processor, cause the system to generate the ground truth annotation by: populating the pre-defined ontology slot based on user selection of a pre-defined image editing annotation element from the plurality of image editing annotation elements; and populating the open-ended ontology slot based on user entry of a text input via an open-ended image editing annotation element of the plurality of image editing annotation elements. 10. The non-transitory computer-readable storage medium as recited in claim 7 , further storing instructions thereon that, when executed by the at least one processor, cause the system to generate the ground truth annotation for the target utterance by: generating IOB tags associated with the target utterance; and mapping one or more of the IOB tags to a canonical form corresponding to a slot within the annotation ontology. 11. The non-transitory computer-readable storage medium as recited in claim 6 , wherein the plurality of image editing annotation elements comprise a frame identifier image editing annotation element, an intent image editing annotation element, an object identifier image editing annotation element, and a location identifier image editing annotation element. 12. The non-transitory computer-readable storage medium as recited in claim 11 , further storing instruction thereon that, when executed by the at least one processor, cause the system to, in response to determining that the speaker is a user: determine an active frame identifier reflecting a topic corresponding to the target utterance, wherein the active frame identifier is different from the determined frame identifier; and further generate the ground truth annotation for the target utterance based on the active frame identifier. 13. The non-transitory computer-readable storage medium as recited in claim 12 , further storing instructions thereon that, when executed by the at least one processor, cause the system to, in response to determining that the speaker is a digital image editing system: based on a determination that the target utterance is a suggestion statement, determine a suggestion slot value associated with the target utterance; and generate the ground truth annotation for the target utterance based on the suggestion slot value.

Assignees

Inventors

Classifications

  • Knowledge-based neural networks; Logical representations of neural networks · CPC title

  • Supervised learning · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Learning methods · CPC title

  • Annotation, e.g. comment data or footnotes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11100917B2 cover?
The present disclosure relates to systems, non-transitory computer-readable media, and methods that generate ground truth annotations of target utterances in digital image editing dialogues in order to create a state-driven training data set. In particular, in one or more embodiments, the disclosed systems utilize machine and user defined tags, machine learning model predictions, and user input…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 24 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).