Animation processing method
US-2024420402-A1 · Dec 19, 2024 · US
US2025157463A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025157463-A1 |
| Application number | US-202519017979-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 13, 2025 |
| Priority date | Mar 30, 2021 |
| Publication date | May 15, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for rendering visual content, in response to one or more utterances, are described. A device receives one or more utterances that define a parameter(s) for desired output content. A system (or the device) identifies natural language data corresponding to the desired content, and uses natural language generation processes to update the natural language data based on the parameter(s). The system (or the device) then generates an image based on the updated natural language data. The system (or the device) also generates video data of an avatar. The device displays the image and the avatar, and synchronizes movements of the avatar with output of synthesized speech of the updated natural language data. The device may also display subtitles of the updated natural language data, and cause a word of the subtitles to be emphasized when synthesized speech of the word is being output.
Opening claim text (preview).
1 - 20 . (canceled) 21 . A computer-implemented method, comprising: receiving input data comprising a natural language user input; determining natural language data responsive to the natural language user input; generating video data corresponding to the natural language data; generating output data comprising the video data; and presenting the output data. 22 . The computer-implemented method of claim 21 , further comprising: receiving an identifier of a language of the natural language data; and generating the video data based at least in part on the identifier. 23 . The computer-implemented method of claim 22 , further comprising: generating the video data to comprise an avatar speaking the language. 24 . The computer-implemented method of claim 21 , further comprising: receiving an identifier of an emotion to be exhibited in the video data; and generating the video data based at least in part on the identifier. 25 . The computer-implemented method of claim 24 , further comprising: generating the video data to comprise an avatar exhibiting the emotion. 26 . The computer-implemented method of claim 21 , further comprising: processing, using a machine learning model, the natural language data to determine an emotion corresponding to the natural language data; and generating the video data based at least in part on the emotion. 27 . The computer-implemented method of claim 21 , further comprising: querying a storage for a two-dimensional image corresponding to an avatar; and generating the video data by rendering the two-dimensional image into a video with the avatar appearing to speak the natural language data. 28 . The computer-implemented method of claim 21 , where generating the video data comprises: determining, using a three-dimensional model, a facial image for a sound represented in the natural language data; and mapping the facial image to an emotion corresponding to the sound. 29 . The computer-implemented method of claim 21 , further comprising: generating audio data including synthesized speech corresponding to the natural language data; and generating the output data to further comprise the audio data. 30 . The computer-implemented method of claim 29 , further comprising: configuring the video data to correspond to an output time duration; and configuring the audio data to correspond to the output time duration. 31 . A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: receive input data comprising a natural language user input; determine natural language data responsive to the natural language user input; generate video data corresponding to the natural language data; generate output data comprising the video data; and present the output data. 32 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive an identifier of a language of the natural language data; and generate the video data based at least in part on the identifier. 33 . The system of claim 32 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate the video data to comprise an avatar speaking the language. 34 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive an identifier of an emotion to be exhibited in the video data; and generate the video data based at least in part on the identifier. 35 . The system of claim 34 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate the video data to comprise an avatar exhibiting the emotion. 36 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process, using a machine learning model, the natural language data to determine an emotion corresponding to the natural language data; and generate the video data based at least in part on the emotion. 37 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: query a storage for a two-dimensional image corresponding to an avatar; and generate the video data by rendering the two-dimensional image into a video with the avatar appearing to speak the natural language data. 38 . The system of claim 31 , wherein the instructions for generating the video data further comprise instructions that, when executed by the at least one processor, further cause the system to: determine, using a three-dimensional model, a facial image for a sound represented in the natural language data; and map the facial image to an emotion corresponding to the sound. 39 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate audio data including synthesized speech corresponding to the natural language data; and generate the output data to further comprise the audio data. 40 . The system of claim 39 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: configure the video data to correspond to an output time duration; and configure the audio data to correspond to the output time duration.
driven by audio data · CPC title
for processing of video signals · CPC title
Two-dimensional [2D] animation, e.g. using sprites · CPC title
of characters, e.g. humans, animals or virtual beings · CPC title
of the speaker; Human-factor methodology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.