Virtual conversational companion

US2025157463A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025157463-A1
Application numberUS-202519017979-A
CountryUS
Kind codeA1
Filing dateJan 13, 2025
Priority dateMar 30, 2021
Publication dateMay 15, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for rendering visual content, in response to one or more utterances, are described. A device receives one or more utterances that define a parameter(s) for desired output content. A system (or the device) identifies natural language data corresponding to the desired content, and uses natural language generation processes to update the natural language data based on the parameter(s). The system (or the device) then generates an image based on the updated natural language data. The system (or the device) also generates video data of an avatar. The device displays the image and the avatar, and synchronizes movements of the avatar with output of synthesized speech of the updated natural language data. The device may also display subtitles of the updated natural language data, and cause a word of the subtitles to be emphasized when synthesized speech of the word is being output.

First claim

Opening claim text (preview).

1 - 20 . (canceled) 21 . A computer-implemented method, comprising: receiving input data comprising a natural language user input; determining natural language data responsive to the natural language user input; generating video data corresponding to the natural language data; generating output data comprising the video data; and presenting the output data. 22 . The computer-implemented method of claim 21 , further comprising: receiving an identifier of a language of the natural language data; and generating the video data based at least in part on the identifier. 23 . The computer-implemented method of claim 22 , further comprising: generating the video data to comprise an avatar speaking the language. 24 . The computer-implemented method of claim 21 , further comprising: receiving an identifier of an emotion to be exhibited in the video data; and generating the video data based at least in part on the identifier. 25 . The computer-implemented method of claim 24 , further comprising: generating the video data to comprise an avatar exhibiting the emotion. 26 . The computer-implemented method of claim 21 , further comprising: processing, using a machine learning model, the natural language data to determine an emotion corresponding to the natural language data; and generating the video data based at least in part on the emotion. 27 . The computer-implemented method of claim 21 , further comprising: querying a storage for a two-dimensional image corresponding to an avatar; and generating the video data by rendering the two-dimensional image into a video with the avatar appearing to speak the natural language data. 28 . The computer-implemented method of claim 21 , where generating the video data comprises: determining, using a three-dimensional model, a facial image for a sound represented in the natural language data; and mapping the facial image to an emotion corresponding to the sound. 29 . The computer-implemented method of claim 21 , further comprising: generating audio data including synthesized speech corresponding to the natural language data; and generating the output data to further comprise the audio data. 30 . The computer-implemented method of claim 29 , further comprising: configuring the video data to correspond to an output time duration; and configuring the audio data to correspond to the output time duration. 31 . A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: receive input data comprising a natural language user input; determine natural language data responsive to the natural language user input; generate video data corresponding to the natural language data; generate output data comprising the video data; and present the output data. 32 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive an identifier of a language of the natural language data; and generate the video data based at least in part on the identifier. 33 . The system of claim 32 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate the video data to comprise an avatar speaking the language. 34 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive an identifier of an emotion to be exhibited in the video data; and generate the video data based at least in part on the identifier. 35 . The system of claim 34 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate the video data to comprise an avatar exhibiting the emotion. 36 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process, using a machine learning model, the natural language data to determine an emotion corresponding to the natural language data; and generate the video data based at least in part on the emotion. 37 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: query a storage for a two-dimensional image corresponding to an avatar; and generate the video data by rendering the two-dimensional image into a video with the avatar appearing to speak the natural language data. 38 . The system of claim 31 , wherein the instructions for generating the video data further comprise instructions that, when executed by the at least one processor, further cause the system to: determine, using a three-dimensional model, a facial image for a sound represented in the natural language data; and map the facial image to an emotion corresponding to the sound. 39 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate audio data including synthesized speech corresponding to the natural language data; and generate the output data to further comprise the audio data. 40 . The system of claim 39 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: configure the video data to correspond to an output time duration; and configure the audio data to correspond to the output time duration.

Assignees

Inventors

Classifications

  • driven by audio data · CPC title

  • for processing of video signals · CPC title

  • Two-dimensional [2D] animation, e.g. using sprites · CPC title

  • G06T13/40Primary

    of characters, e.g. humans, animals or virtual beings · CPC title

  • of the speaker; Human-factor methodology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025157463A1 cover?
Techniques for rendering visual content, in response to one or more utterances, are described. A device receives one or more utterances that define a parameter(s) for desired output content. A system (or the device) identifies natural language data corresponding to the desired content, and uses natural language generation processes to update the natural language data based on the parameter(s). …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06T13/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).