Device, method, and program for enhancing output content through iterative generation

US12380887B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12380887-B2
Application numberUS-202318305652-A
CountryUS
Kind codeB2
Filing dateApr 24, 2023
Priority dateDec 4, 2019
Publication dateAug 5, 2025
Grant dateAug 5, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of improving output content through iterative generation is provided. The method includes receiving a natural language input, obtaining user intention information based on the natural language input by using a natural language understanding (NLU) model, setting a target area in base content based on a first user input, determining input content based on the user intention information or a second user input, generating output content related to the base content based on the input content, the target area, and the user intention information by using a neural network (NN) model, generating a caption for the output content by using an image captioning model, calculating similarity between text of the natural language input and the generated output content, and iterating generation of the output content based on the similarity.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor of a device, cause the device to: present a base content; while the base content is being presented, receive a user input for selecting a target area of the base content; present an indication, on the base content being presented, corresponding to the selected target area of the base content; while the base content is being presented with the indication being presented thereon, receive a natural language input for generating output content; and present modified base content in which the base content is modified to include the output content, in the target area of the base content, which is generated based on the base content, the target area, and the natural language input, wherein the output content is generated based on an object that corresponds to the natural language input, wherein the object that corresponds to the natural language input is determined by using user intention information obtained based on the natural language input, wherein the output content is generated by using an artificial intelligence (AI) model, and wherein the user intention information is obtained by using the AI model. 2. The non-transitory computer-readable storage medium of claim 1 , wherein the base content and the modified base content are images. 3. The non-transitory computer-readable storage medium of claim 1 , wherein the target area is a partial area of the base content. 4. The non-transitory computer-readable storage medium of claim 1 , wherein the target area corresponds to a base content object detected in the base content. 5. The non-transitory computer-readable storage medium of claim 1 , wherein at least one of a size or a shape of the target area is user adjustable. 6. The non-transitory computer-readable storage medium of claim 1 , wherein the natural language input includes at least one of a voice input or a text input, and wherein in case the natural language input includes the voice input, the voice input is converted into text by using an automatic speech recognition (ASR) model. 7. The non-transitory computer-readable storage medium of claim 1 , wherein the user intention information comprises attribute information. 8. The non-transitory computer-readable storage medium of claim 1 , wherein the object is determined by using the AI model. 9. The non-transitory computer-readable storage medium of claim 1 , wherein the output content is generated by compositing the object that corresponds to the natural language input into the target area of the base content. 10. The non-transitory computer-readable storage medium of claim 1 , wherein a base content object is recognized from the base content by using the AI model, and wherein in the output content is generated based on the base content object by using the AI model. 11. The non-transitory computer-readable storage medium of claim 1 , wherein the instructions, when executed by the at least one processor, further cause the device to: present a user interface for selecting among a plurality of contents that each include the object that corresponds to the natural language input. 12. The non-transitory computer-readable storage medium of claim 1 , wherein the instructions, when executed by the at least one processor, further cause the device to: transition from presenting the base content with the indication being presented thereon to presenting the modified base content in which the base content is modified to include the output content in the target area of the base content. 13. The non-transitory computer-readable storage medium of claim 1 , wherein the output content is generated to match with the base content. 14. A method performed by a device for modifying content, the method comprising: presenting a base content; while the base content is being presented, receiving a user input for selecting a target area of the base content; presenting an indication, on the base content being presented, corresponding to the selected target area of the base content; while the base content is being presented with the indication being presented thereon, receiving a natural language input for generating output content; and presenting modified base content in which the base content is modified to include the output content, in the target area of the base content, which is generated based on the base content, the target area, and the natural language input, wherein the output content is generated based on an object that corresponds to the natural language input, wherein the object that corresponds to the natural language input is determined by using user intention information obtained based on the natural language input, wherein the output content is generated by using an artificial intelligence (AI) model, and wherein the user intention information is obtained by using the AI model. 15. The method of claim 14 , wherein the base content and the modified base content are images. 16. The method of claim 14 , wherein the target area is a partial area of the base content. 17. The method of claim 14 , wherein the target area corresponds to a base content object detected in the base content. 18. The method of claim 14 , wherein at least one of a size or a shape of the target area is user adjustable. 19. The method of claim 14 , wherein the natural language input includes at least one of a voice input or a text input, and wherein in case the natural language input includes the voice input, the voice input is converted into text by using an automatic speech recognition (ASR) model. 20. The method of claim 14 , wherein the user intention information comprises attribute information. 21. The method of claim 14 , wherein the object is determined by using the AI model. 22. The method of claim 14 , wherein the output content is generated by compositing the object that corresponds to the natural language input into the target area of the base content. 23. The method of claim 14 , wherein a base content object is recognized from the base content by using the AI model, and wherein in the output content is generated based on the base content object by using the AI model. 24. The method of claim 14 , further comprising: presenting a user interface for selecting among a plurality of contents that each include the object that corresponds to the natural language input. 25. The method of claim 14 , wherein the presenting of the modified base content includes transitioning from presenting the base content with the indication being presented thereon to presenting the modified base content in which the base content is modified to include the output content in the target area of the base content. 26. The method of claim 14 , wherein the output content is generated to match with the base content. 27. A device for modifying content, the device comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the device to: present a base content; while the base content is being presented, receive a user input for selecting a target area of the base content; present an indication, on the base content being presented, corresponding to the selected target area of the base content; while the base content is being presented with the indi

Assignees

Inventors

Classifications

  • Adversarial learning · CPC title

  • Generative networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12380887B2 cover?
A method of improving output content through iterative generation is provided. The method includes receiving a natural language input, obtaining user intention information based on the natural language input by using a natural language understanding (NLU) model, setting a target area in base content based on a first user input, determining input content based on the user intention information o…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 05 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).