What technology area does this patent fall under?

Primary CPC classification G06T13/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 21 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Content generation

US2024095987A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2024095987-A1
Application number	US-202218081076-A
Country	US
Kind code	A1
Filing date	Dec 14, 2022
Priority date	Sep 19, 2022
Publication date	Mar 21, 2024
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the entity may be determined. Background image data associated with the entities and the portion may be determined, and attributes which modify the entities in the natural language sentence may be extracted. Spatial relationships between two or more of the entities may further be extracted. Image data representing the natural language data may be generated based on the background image data, the entities, the attributes, and the spatial relationships. Video data may be generated based on the image data, where the video data includes animations of the entities moving.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: receiving first input audio data corresponding to a first spoken natural language input; performing automatic speech recognition (ASR) processing on the first input audio data to determine first ASR output data, wherein the first ASR output data represents a transcript of the first spoken natural language input; determining, using the first ASR output data, that the first spoken natural language input requests a narrative be output and includes a first narrative parameter; processing, using a first trained machine learning (ML) component, the first narrative parameter to generate natural language data corresponding to the narrative, wherein the natural language data comprises: a first portion corresponding to a first scene of the narrative, and a second portion corresponding to a second scene of the narrative; processing, using a second trained ML component, the first portion of the natural language data to determine a first entity represented in the first portion of the natural language data; determining, using the second trained ML component, first image data corresponding to the first entity; processing, using a third trained ML component, the first portion of the natural language data and the first entity to determine first background image data corresponding to the first scene of the narrative; processing, using a fourth trained ML component, the first portion of the natural language data and the first entity to determine an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; processing, using a fifth trained ML component, the first image data, the first background image data, and the attribute to generate first scene data, wherein the first scene data indicates how the first image data is to be rendered with the first background image data based on the attribute; generating first output image data based on the first scene data; processing, using the second trained ML component, the second portion of the natural language data to determine the first entity is represented in the second portion of the natural language data; using the first scene data, determining the first image data is to be used to render the first entity in the second scene of the narrative; and using the first image data, generating second output image data corresponding to the second scene of the narrative. 2 . The computer-implemented method of claim 1 , further comprising: storing a representation of the first entity, the first image data, the first background image data, and the attribute in association with a content request identifier corresponding to the first spoken natural language input; retrieving the representation of the first entity; based at least in part on determining that the first entity is represented in the second portion, retrieving the first image data; processing, using the second trained ML component, the second portion of the natural language data to determine a second entity represented in the second portion of the natural language data; determining, using the second trained ML component, second image data corresponding to the second entity; processing, using the third trained ML component, the second portion of the natural language data, the first entity, and the second entity to determine second background image data for the second scene of the narrative; and processing, using the fifth trained ML component, the first image data, the second image data, and the second background image data to generate second scene data, wherein the second scene data indicates how the first image data and the second image data are to be rendered with the second background image data. 3 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML model, the natural language data to determine a third portion of the natural language data that corresponds to the first narrative parameter; and replacing the third portion of the natural language data with the first narrative parameter. 4 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML model, the natural language data and the first entity to determine a portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, wherein the first scene data indicates the portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, and wherein the second output image data is generated to include the first image data located with respect to the first background image data as indicated in the first scene data. 5 . A computer-implemented method comprising: receiving first input data corresponding to a first user input; determining the first user input requests content be output and indicates a first parameter for configuration of the content; based on the first parameter, determining natural language data corresponding to the content; determining a first entity included in a first portion of the natural language data; determining first image data corresponding to the first entity; determining first background image data representing the natural language data; generating first scene data indicating how the first image data is to be rendered with the first background image data; generating first output image data based on the first scene data; determining the first entity is included in a second portion of the natural language data; and based on the first scene data and the first entity being included in the second portion of the natural language data, generating second output image data to represent the first entity using the first image data. 6 . The computer-implemented method of claim 5 , further comprising: determining, in the first portion of the natural language data, an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; and generating the first scene data to indicate how the first image data is to be rendered using the attribute. 7 . The computer-implemented method of claim 5 , further comprising: determining a spatial relationship between the first entity and a second entity included in the first portion of the natural language data, wherein the spatial relationship represents how the first entity is to be rendered with the second entity. 8 . The computer-implemented method of claim 5 , further comprising: determining a second entity included in the second portion of the natural language data; determining second image data corresponding to the second entity; determining, using the first entity, the second entity, the first image data, and the second image data, second background image data; and generating second scene data indicating how the first image data and the second image data are to be rendered with the second background image data. 9 . The computer-implemented method of claim 5 , further comprising: determining a third portion of the natural language data corresponding to a second entity; determining the second entity corresponds to the first entity; and based on the second entity corresponding to the first entity, replacing the second entity with the first entity in the third portion of the natural language data. 10 . The computer-implemented method of claim 5 , further comprising: performing text-to-speech (TTS) processing using the natural language data to generate first output audio data comprising: a first portion corresponding to the first portion of the n

Assignees

Amazon Tech Inc

Inventors

Classifications

G06F16/5866
using information manually generated, e.g. tags, keywords, comments, manually generated location and time information · CPC title
G06T13/00Primary
Animation · CPC title
G06F40/166
Editing, e.g. inserting or deleting · CPC title
G06F40/279Primary
Recognition of textual entities · CPC title
G06F40/40
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

View patent family 90243967

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024095987A1 cover?: Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06T13/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 21 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generating visual feedback

Artificial intelligence for generating structured descriptions of scenes

Applying artificial intelligence to generate motion information

Frequently asked questions