Generating visual feedback
US-11651537-B2 · May 16, 2023 · US
US2024095987A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024095987-A1 |
| Application number | US-202218081076-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 14, 2022 |
| Priority date | Sep 19, 2022 |
| Publication date | Mar 21, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the entity may be determined. Background image data associated with the entities and the portion may be determined, and attributes which modify the entities in the natural language sentence may be extracted. Spatial relationships between two or more of the entities may further be extracted. Image data representing the natural language data may be generated based on the background image data, the entities, the attributes, and the spatial relationships. Video data may be generated based on the image data, where the video data includes animations of the entities moving.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: receiving first input audio data corresponding to a first spoken natural language input; performing automatic speech recognition (ASR) processing on the first input audio data to determine first ASR output data, wherein the first ASR output data represents a transcript of the first spoken natural language input; determining, using the first ASR output data, that the first spoken natural language input requests a narrative be output and includes a first narrative parameter; processing, using a first trained machine learning (ML) component, the first narrative parameter to generate natural language data corresponding to the narrative, wherein the natural language data comprises: a first portion corresponding to a first scene of the narrative, and a second portion corresponding to a second scene of the narrative; processing, using a second trained ML component, the first portion of the natural language data to determine a first entity represented in the first portion of the natural language data; determining, using the second trained ML component, first image data corresponding to the first entity; processing, using a third trained ML component, the first portion of the natural language data and the first entity to determine first background image data corresponding to the first scene of the narrative; processing, using a fourth trained ML component, the first portion of the natural language data and the first entity to determine an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; processing, using a fifth trained ML component, the first image data, the first background image data, and the attribute to generate first scene data, wherein the first scene data indicates how the first image data is to be rendered with the first background image data based on the attribute; generating first output image data based on the first scene data; processing, using the second trained ML component, the second portion of the natural language data to determine the first entity is represented in the second portion of the natural language data; using the first scene data, determining the first image data is to be used to render the first entity in the second scene of the narrative; and using the first image data, generating second output image data corresponding to the second scene of the narrative. 2 . The computer-implemented method of claim 1 , further comprising: storing a representation of the first entity, the first image data, the first background image data, and the attribute in association with a content request identifier corresponding to the first spoken natural language input; retrieving the representation of the first entity; based at least in part on determining that the first entity is represented in the second portion, retrieving the first image data; processing, using the second trained ML component, the second portion of the natural language data to determine a second entity represented in the second portion of the natural language data; determining, using the second trained ML component, second image data corresponding to the second entity; processing, using the third trained ML component, the second portion of the natural language data, the first entity, and the second entity to determine second background image data for the second scene of the narrative; and processing, using the fifth trained ML component, the first image data, the second image data, and the second background image data to generate second scene data, wherein the second scene data indicates how the first image data and the second image data are to be rendered with the second background image data. 3 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML model, the natural language data to determine a third portion of the natural language data that corresponds to the first narrative parameter; and replacing the third portion of the natural language data with the first narrative parameter. 4 . The computer-implemented method of claim 1 , further comprising: processing, using a sixth ML model, the natural language data and the first entity to determine a portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, wherein the first scene data indicates the portion of the first background image data where the first image data is to be located when it is rendered with the first background image data, and wherein the second output image data is generated to include the first image data located with respect to the first background image data as indicated in the first scene data. 5 . A computer-implemented method comprising: receiving first input data corresponding to a first user input; determining the first user input requests content be output and indicates a first parameter for configuration of the content; based on the first parameter, determining natural language data corresponding to the content; determining a first entity included in a first portion of the natural language data; determining first image data corresponding to the first entity; determining first background image data representing the natural language data; generating first scene data indicating how the first image data is to be rendered with the first background image data; generating first output image data based on the first scene data; determining the first entity is included in a second portion of the natural language data; and based on the first scene data and the first entity being included in the second portion of the natural language data, generating second output image data to represent the first entity using the first image data. 6 . The computer-implemented method of claim 5 , further comprising: determining, in the first portion of the natural language data, an attribute corresponding to the first entity, wherein the attribute represents how the first entity is to be presented; and generating the first scene data to indicate how the first image data is to be rendered using the attribute. 7 . The computer-implemented method of claim 5 , further comprising: determining a spatial relationship between the first entity and a second entity included in the first portion of the natural language data, wherein the spatial relationship represents how the first entity is to be rendered with the second entity. 8 . The computer-implemented method of claim 5 , further comprising: determining a second entity included in the second portion of the natural language data; determining second image data corresponding to the second entity; determining, using the first entity, the second entity, the first image data, and the second image data, second background image data; and generating second scene data indicating how the first image data and the second image data are to be rendered with the second background image data. 9 . The computer-implemented method of claim 5 , further comprising: determining a third portion of the natural language data corresponding to a second entity; determining the second entity corresponds to the first entity; and based on the second entity corresponding to the first entity, replacing the second entity with the first entity in the third portion of the natural language data. 10 . The computer-implemented method of claim 5 , further comprising: performing text-to-speech (TTS) processing using the natural language data to generate first output audio data comprising: a first portion corresponding to the first portion of the n
using information manually generated, e.g. tags, keywords, comments, manually generated location and time information · CPC title
Animation · CPC title
Editing, e.g. inserting or deleting · CPC title
Recognition of textual entities · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.