Content generation

US2024095987A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024095987-A1
Application numberUS-202218081076-A
CountryUS
Kind codeA1
Filing dateDec 14, 2022
Priority dateSep 19, 2022
Publication dateMar 21, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the entity may be determined. Background image data associated with the entities and the portion may be determined, and attributes which modify the entities in the natural language sentence may be extracted. Spatial relationships between two or more of the entities may further be extracted. Image data representing the natural language data may be generated based on the background image data, the entities, the attributes, and the spatial relationships. Video data may be generated based on the image data, where the video data includes animations of the entities moving.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: receiving first input audio data corresponding to a first spoken natural language input; performing automatic speech recognition (ASR) processing on the first input audio data to determine first ASR output data, wherein the first ASR output data represents a transcript of the first spoken natural language input; determining, using the first ASR output data, that the first spoken natural language input requests a narrative be output and includes a first narrative parameter; processing, using a first trained machine learning (ML) component, the first narrative parameter to generate natural language data corresponding to the narrative, wherein the natural language data comprises: a first portion corresponding to a first scene of the narrative, and a second portion corresponding to a second scene of the narrative; processing, using a second trained ML component, the first portion of the natural language data to determine a first entity represented in the first portion of the natural language data; determining, using the second trained ML component, first image data corresponding to the first entity; processing, using a third trained ML component, the first portion of the natural language data and the first entity to determine first background image data corresponding to the first scene of the narrative; processing, using a fourth trained ML component, the first portion of the natural language data and the first entity to determine an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; processing, using a fifth trained ML component, the first image data, the first background image data, and the attribute to generate first scene data, wherein the first scene data indicates how the first image data is to be rendered with the first background image data based on the attribute; generating first output image data based on the first scene data; processing, using the second trained ML component, the second portion of the natural language data to determine the first entity is represented in the second portion of the natural language data; using the first scene data, determining the first image data is to be used to render the first entity in the second scene of the narrative; and using the first image data, generating second output image data corresponding to the second scene of the narrative. 2 . The computer-implemented method of claim 1 , further comprising: storing a representation of the first entity, the first image data, the first background image data, and the attribute in association with a content request identifier corresponding to the first spoken natural language input; retrieving the representation of the first entity; based at least in part on determining that the first entity is represented in the second portion, retrieving the first image data; processing, using the second trained ML component, the second portion of the natural language data to determine a second entity represented in the second portion of the natural language data; determining, using the second trained ML component, second image data corresponding to the second entity; processing, using the third trained ML component, the second portion of the natural language data, the first entity, and the second entity to determine second background image data for the second scene of the narrative; and processing, using the fifth trained ML component, the first image data, the second image data, and the second background image data to generate second scene data, wherein the second scene data indicates how the first image data and the second image data are to be rendered with the second background image data. 3 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML model, the natural language data to determine a third portion of the natural language data that corresponds to the first narrative parameter; and replacing the third portion of the natural language data with the first narrative parameter. 4 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML model, the natural language data and the first entity to determine a portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, wherein the first scene data indicates the portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, and wherein the second output image data is generated to include the first image data located with respect to the first background image data as indicated in the first scene data. 5 . A computer-implemented method comprising: receiving first input data corresponding to a first user input; determining the first user input requests content be output and indicates a first parameter for configuration of the content; based on the first parameter, determining natural language data corresponding to the content; determining a first entity included in a first portion of the natural language data; determining first image data corresponding to the first entity; determining first background image data representing the natural language data; generating first scene data indicating how the first image data is to be rendered with the first background image data; generating first output image data based on the first scene data; determining the first entity is included in a second portion of the natural language data; and based on the first scene data and the first entity being included in the second portion of the natural language data, generating second output image data to represent the first entity using the first image data. 6 . The computer-implemented method of claim 5 , further comprising: determining, in the first portion of the natural language data, an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; and generating the first scene data to indicate how the first image data is to be rendered using the attribute. 7 . The computer-implemented method of claim 5 , further comprising: determining a spatial relationship between the first entity and a second entity included in the first portion of the natural language data, wherein the spatial relationship represents how the first entity is to be rendered with the second entity. 8 . The computer-implemented method of claim 5 , further comprising: determining a second entity included in the second portion of the natural language data; determining second image data corresponding to the second entity; determining, using the first entity, the second entity, the first image data, and the second image data, second background image data; and generating second scene data indicating how the first image data and the second image data are to be rendered with the second background image data. 9 . The computer-implemented method of claim 5 , further comprising: determining a third portion of the natural language data corresponding to a second entity; determining the second entity corresponds to the first entity; and based on the second entity corresponding to the first entity, replacing the second entity with the first entity in the third portion of the natural language data. 10 . The computer-implemented method of claim 5 , further comprising: performing text-to-speech (TTS) processing using the natural language data to generate first output audio data comprising: a first portion corresponding to the first portion of the n

Assignees

Inventors

Classifications

  • using information manually generated, e.g. tags, keywords, comments, manually generated location and time information · CPC title

  • G06T13/00Primary

    Animation · CPC title

  • Editing, e.g. inserting or deleting · CPC title

  • G06F40/279Primary

    Recognition of textual entities · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024095987A1 cover?
Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06T13/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 21 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).