What technology area does this patent fall under?

Primary CPC classification G06F40/279. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Generating video content from user input data

US12579721B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12579721-B2
Application number	US-202218081076-A
Country	US
Kind code	B2
Filing date	Dec 14, 2022
Priority date	Sep 19, 2022
Publication date	Mar 17, 2026
Grant date	Mar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the entity may be determined. Background image data associated with the entities and the portion may be determined, and attributes which modify the entities in the natural language sentence may be extracted. Spatial relationships between two or more of the entities may further be extracted. Image data representing the natural language data may be generated based on the background image data, the entities, the attributes, and the spatial relationships. Video data may be generated based on the image data, where the video data includes animations of the entities moving.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: receiving first input audio data corresponding to a first spoken natural language input requesting a narrative be output, the first spoken natural language input indicating a first narrative parameter; generating parameter data including the first narrative parameter included in the first spoken natural language input; generating outline data based on the parameter data, wherein the outline data includes the first narrative parameter and a second narrative parameter not included in the parameter data; processing, using a first trained machine learning (ML) component, the outline data to generate natural language data corresponding to the narrative requested in the first spoken natural language input, wherein the natural language data includes more words than the outline data, and the natural language data comprises: a first portion corresponding to a first scene of the narrative, and a second portion corresponding to a second scene of the narrative; processing, using a second trained ML component, the first portion of the natural language data to determine a first entity represented in the first portion of the natural language data; determining, using the second trained ML component, first image data corresponding to the first entity; processing, using a third trained ML component, the first portion of the natural language data to determine first background image data corresponding to the first scene of the narrative; processing, using a fourth trained ML component, the first portion of the natural language data and the first entity to determine an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; processing, using a fifth trained ML component, the first image data, the first background image data, and the attribute to generate first scene data, wherein the first scene data indicates how the first image data is to be rendered with the first background image data based on the attribute; based on the first scene data, generating first output image data including the first image data and the first background image data, the first output image data corresponding to the first scene of the narrative; processing, using the second trained ML component, the second portion of the natural language data to determine the first entity is represented in the second portion of the natural language data; determining the first image data is to be used to render the first entity in the second scene of the narrative based on the first image data being used to represent the first entity in the first scene data; generating second output image data including the first image data, the second output image data corresponding to the second scene of the narrative; causing presentation of the first output image data; and causing presentation of the second output image data. 2 . The computer-implemented method of claim 1 , further comprising: storing a representation of the first entity, the first image data, the first background image data, and the attribute in association with a content request identifier corresponding to the first spoken natural language input; retrieving the representation of the first entity; based at least in part on determining that the first entity is represented in the second portion, retrieving the first image data; processing, using the second trained ML component, the second portion of the natural language data to determine a second entity represented in the second portion of the natural language data; determining, using the second trained ML component, second image data corresponding to the second entity; processing, using the third trained ML component, the second portion of the natural language data to determine second background image data for the second scene of the narrative; and processing, using the fifth trained ML component, the first image data, the second image data, and the second background image data to generate second scene data, wherein the second scene data indicates how the first image data and the second image data are to be rendered with the second background image data. 3 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML component, the natural language data to determine a third portion of the natural language data that corresponds to the first narrative parameter; and replacing the third portion of the natural language data with the first narrative parameter. 4 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML component, the natural language data and the first entity to determine a portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, wherein the first scene data indicates the portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, and wherein the second output image data is generated to include the first image data located with respect to the first background image data as indicated in the first scene data. 5 . A computer-implemented method comprising: receiving first input data requesting content be output, the first input data indicating a first parameter; generating outline data based on the first parameter, wherein the outline data includes the first parameter and a second parameter not included in the first input data; based on the outline data, generating natural language data corresponding to the content requested in the first input data, wherein the natural language data includes more words than the outline data; processing, using a trained machine learning (ML) component, a first portion of the natural language data to determine a first entity included in the first portion of the natural language data; determining, using the trained ML component, first image data corresponding to the first entity; determining first background image data representing the natural language data; generating first scene data indicating how the first image data is to be rendered with the first background image data; based on the first scene data, generating first output image data including the first image data and the first background image data, the first output image data representing at least the first portion of the natural language data; determining, by the trained ML component, the first entity is included in a second portion of the natural language data; based on the first entity being included in the second portion of the natural language data and the first image data being used to represent the first entity in the first scene data, generating second output image data including the first image data, the second output image data representing at least the second portion of the natural language data; causing presentation of the first output image data; and causing presentation of the second output image data. 6 . The computer-implemented method of claim 5 , further comprising: determining, in the first portion of the natural language data, an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; and generating the first scene data to indicate how the first image data is to be rendered using the attribute. 7 . The computer-implemented method of claim 5 , further comprising: determining a spatial relationship between the first entity and a second entity included in the first portion of the natural language data, wherein the spatial relationship represents how the first entity is to be rendered with the second entity.

Assignees

Amazon Tech Inc

Inventors

Classifications

G06F40/279Primary
Recognition of textual entities · CPC title
G06T11/00
Two-dimensional [2D] image generation · CPC title
G10L15/18
using natural language modelling · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G06F40/166
Editing, e.g. inserting or deleting · CPC title

Patent family

Related publications grouped by family.

View patent family 90243967

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12579721B2 cover?: Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).