Virtual conversational companion

US12205577B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12205577-B1
Application numberUS-202117217031-A
CountryUS
Kind codeB1
Filing dateMar 30, 2021
Priority dateMar 30, 2021
Publication dateJan 21, 2025
Grant dateJan 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for rendering visual content, in response to one or more utterances, are described. A device receives one or more utterances that define a parameter(s) for desired output content. A system (or the device) identifies natural language data corresponding to the desired content, and uses natural language generation processes to update the natural language data based on the parameter(s). The system (or the device) then generates an image based on the updated natural language data. The system (or the device) also generates video data of an avatar. The device displays the image and the avatar, and synchronizes movements of the avatar with output of synthesized speech of the updated natural language data. The device may also display subtitles of the updated natural language data, and cause a word of the subtitles to be emphasized when synthesized speech of the word is being output.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving first input audio data corresponding to a first spoken natural language input; determining the first spoken natural language input includes: a first portion requesting first content be output; and a second portion representing a first parameter to be used instead of a second parameter included in the first content; generating, by a content generator component, first natural language data corresponding to the first content updated to include the first parameter instead of the second parameter; generating, by an image generator component, first image data representing the first natural language data; generating, by an avatar generator component, first video data including a representation of an avatar corresponding to the first natural language data; and generating, by a response generator component, first output data including the first image data, first output audio data including first synthesized speech corresponding to the first natural language data, and the first video data, wherein the first output data synchronizes display of the representation of the avatar with output of the first synthesized speech. 2. The computer-implemented method of claim 1 , further comprising: determining, by the content generator component, second natural language data corresponding to the first content; determining, by the content generator component, a portion of the second natural language data corresponding to the first second parameter; and generating, by the content generator component, the first natural language data by performing natural language generation processing to replace the portion with the first parameter. 3. The computer-implemented method of claim 1 , further comprising: determining, by a question and answering (Q&A) component, a dialog identifier corresponding to an ongoing dialog; determining, by the Q&A component, second natural language data associated with the dialog identifier, the second natural language data being represented in previous output data; generating, by the Q&A component, third natural language data corresponding to a question related to the second natural language data; generating, by the avatar generator component, first data including the representation of the avatar corresponding to the third natural language data; receiving, by the response generator component, the third natural language data and the first data; sending, by the response generator component, the third natural language data to a text-to-speech (TTS) component; receiving, by the response generator component, second output audio data from the TTS component, the second output audio data including second synthesized speech corresponding to the third natural language data; and generating, by the response generator component, second output data including the first data and the second output audio data, the second output data synchronizing display of the representation of the avatar with output of the second synthesized speech. 4. The computer-implemented method of claim 1 , further comprising: determining a user identifier corresponding to the first input audio data; querying, by the content generator component, a profile storage for a parameter associated with the user identifier; receiving, by the content generator component and from the profile storage, first data representing a third parameter; and generating, by the content generator component, the first natural language data further based at least in part on the third parameter. 5. The computer-implemented method of claim 1 , further comprising: receiving second input audio data corresponding to a second spoken natural language input; determining the second spoken natural language input requests a next segment of content be output; determining a content identifier corresponding to the first content; receiving, by a content navigator component, the content identifier and a request for a next segment of the first content; determining, by the content generator component and based at least in part on the content navigator component receiving the content identifier and the request, second natural language data associated with the content identifier and corresponding to the next segment; generating, by the content generator component, third natural language data by replacing at least a first portion of the second natural language data with the first parameter; generating, by the image generator component, second image data representing the third natural language data; and generating, by the response generator component, second output data including the second image data and second output audio data including second synthesized speech corresponding to the third natural language data. 6. The computer-implemented method of claim 1 , further comprising: receiving, by the response generator component, the first output audio data from a text-to-speech component, the first output audio data including a start token and an end token corresponding to a portion of the first synthesized speech; and generating, by the response generator component, the first output data to: correspond a first portion of the first video data, corresponding to a beginning of a facial expression of the avatar, with a first portion of the first synthesized speech corresponding to the start token; and correspond a second portion of the first video data, corresponding to an end of the facial expression, with a second portion of the first synthesized speech corresponding to the end token. 7. The computer-implemented method of claim 1 , further comprising, by the response generator component: configuring the first output audio data to correspond to a first output time duration; and configuring the first video data to correspond to the first output time duration. 8. The computer-implemented method of claim 1 , wherein the avatar corresponds to a first character and the method further comprises: determining second natural language data corresponding to the first content; determining the second natural language data corresponds to a second character; and generating, by the avatar generator component, second video data including a representation of a second avatar different from the avatar. 9. The computer-implemented method of claim 8 , wherein the first synthesized speech corresponds to first audio characteristics and the method further comprises: generating, by the response generator component, second output audio data including second synthesized speech corresponding to the second natural language data, the second synthesized speech further corresponding to second audio characteristics different from the first audio characteristics. 10. A computing system comprising: a first component configured to determine a first spoken natural language input includes: a first portion requesting first content be output; and a second portion representing a first parameter to be used instead of a second parameter included in the first content; a content generator component configured to generate first natural language data corresponding to the first content updated to include the first parameter instead of the second parameter; an image generator component configured to generate first image data representing the first natural language data; an avatar generator component configured to generate first video data including a representation of an avatar corresponding to the first natural language data; and a response generator component configured to generate first output data including the first image data, first output audio data including first synthesized speech corresponding to the first n

Assignees

Inventors

Classifications

  • for processing of video signals · CPC title

  • Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title

  • Two-dimensional [2D] animation, e.g. using sprites · CPC title

  • driven by audio data · CPC title

  • G06T13/40Primary

    of characters, e.g. humans, animals or virtual beings · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12205577B1 cover?
Techniques for rendering visual content, in response to one or more utterances, are described. A device receives one or more utterances that define a parameter(s) for desired output content. A system (or the device) identifies natural language data corresponding to the desired content, and uses natural language generation processes to update the natural language data based on the parameter(s). …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06T13/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).