Method and apparatus for providing interactive avatar services

US12450809B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12450809-B2
Application numberUS-202318098428-A
CountryUS
Kind codeB2
Filing dateJan 18, 2023
Priority dateJan 18, 2022
Publication dateOct 21, 2025
Grant dateOct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of providing an avatar service includes obtaining a user-uttered voice and a spatial information of a user-utterance space, transmitting the user-uttered voice and the spatial information to a server, receiving, from the server, a first avatar voice answer and an avatar facial expression sequence corresponding to the first avatar voice, which are determined based on the user-uttered voice and the spatial information, determining first avatar facial expression data, based on the first avatar voice answer and the avatar facial expression sequence, identifying a certain event during reproduction of a first avatar animation created based on the first avatar voice answer and the first avatar facial expression data, determining second avatar facial expression data or a second avatar voice answer, based on the certain event, and reproducing a second avatar animation created based on the second avatar facial expression data or the second avatar voice answer.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, performed by an electronic device, of providing an avatar service, comprising: obtaining a user-uttered voice and a spatial information of a user-utterance space where a user utters the user-uttered voice, the spatial information comprising spatial characteristics of the user-utterance space based on at least one of images captured by a camera and a sound obtained through a microphone, wherein the spatial information comprises at least one of: whether the user-utterance space is public or private; or whether the user-utterance space is quiet or noisy; transmitting the user-uttered voice and the spatial information to a server; receiving, from the server, a first avatar voice answer and an avatar facial expression sequence corresponding to the first avatar voice answer, which are determined based on the user-uttered voice and an avatar response mode, wherein the avatar response mode is determined based on the spatial information; determining first avatar facial expression data, based on the first avatar voice answer and the avatar facial expression sequence; identifying a certain event during reproduction of a first avatar animation created based on the first avatar voice answer and the first avatar facial expression data; determining second avatar facial expression data or a second avatar voice answer, based on the certain event; and stopping reproduction of the first avatar animation, and reproducing a second avatar animation created based on the second avatar facial expression data or the second avatar voice answer. 2. The method of claim 1 , wherein the spatial information comprises information about whether the user-utterance space is a public place and a level of noise in the user-utterance space. 3. The method of claim 1 , wherein the first avatar facial expression data and the second avatar facial expression data each comprise a set of coefficients for each of a plurality of reference three-dimensional (3D) meshes for modeling a facial expression of the first avatar animation and the second avatar animation, respectively. 4. The method of claim 1 , wherein: the second avatar facial expression data comprises lip sync data, and the lip sync data is obtained using an artificial intelligence (AI) model. 5. The method of claim 4 , wherein the AI model is trained using data normalized based on an available range according to the lip sync data. 6. The method of claim 1 , wherein the certain event comprises at least one of an utterance mode change event, an observation mode event, or a refresh mode event. 7. The method of claim 6 , wherein, based on the certain event being the refresh mode event, the stopping reproduction of the first avatar animation, and the reproducing of the second avatar animation comprises: stopping the reproduction of the first avatar animation at a point in time; reproducing a preset refresh animation; and reproducing the first avatar animation from the point in time at which the first avatar animation is stopped. 8. The method of claim 6 , wherein, based on the certain event being the utterance mode change event, the determining of the second avatar facial expression data or the second avatar voice answer comprises: determining the second avatar facial expression data by modifying the first avatar facial expression data, based on an utterance mode obtained as a result of the certain event; and modifying the first avatar voice answer, based on the utterance mode. 9. The method of claim 6 , wherein, based on the certain event being the observation mode event, the determining of the second avatar facial expression data or the second avatar voice answer comprises determining the second avatar facial expression data by changing a face direction or eye direction of the first avatar animation. 10. A method, performed by a server, of providing an avatar service through an electronic device, the method comprising: receiving, from the electronic device, a user-uttered voice and spatial information of a user-utterance space where a user utters the user-uttered voice, the spatial information comprising spatial characteristics of the user-utterance space based on at least one of images captured by a camera and a sound obtained through a microphone, wherein the spatial information comprises at least one of: whether the user-utterance space is public or private; or whether the user-utterance space is quiet or noisy; determining an avatar response mode for the user-uttered voice, based on the spatial information; generating a first avatar voice answer for an avatar to respond to the user-uttered voice and an avatar facial expression sequence corresponding to the first avatar voice answer, based on the user-uttered voice and the avatar response mode; and transmitting the first avatar voice answer and the avatar facial expression sequence for generating a first avatar animation to the electronic device. 11. An electronic device for providing an avatar service, comprising: a communication interface; a storage storing at least one instruction; and at least one processor configured to execute the at least one instruction stored in the storage, wherein the at least one processor is configured to execute the at least one instruction to: obtain a user-uttered voice and a spatial information of a user-utterance space where a user utters the user-uttered voice, the spatial information comprising spatial characteristics of the user-utterance space based on at least one of images captured by a camera and a sound obtained through a microphone, wherein the spatial information comprises at least one of: whether the user-utterance space is public or private; or whether the user-utterance space is quiet or noisy; transmit the user-uttered voice and the spatial information to a server; receive, from the server through the communication interface, a first avatar voice answer and an avatar facial expression sequence corresponding to the first avatar voice answer, which are determined based on the user-uttered voice and an avatar response mode, wherein the avatar response mode is determined based on the spatial information; determine first avatar facial expression data, based on the first avatar voice answer and the avatar facial expression sequence; identify a certain event during reproduction of a first avatar animation created based on the first avatar voice answer and the first avatar facial expression data; determine second avatar facial expression data or a second avatar voice answer, based on the certain event; and reproduce a second avatar animation created based on the second avatar facial expression data or the second avatar voice answer. 12. The electronic device of claim 11 , wherein the spatial information comprises information about whether the user-utterance space is a public place and a level of noise in the user-utterance space. 13. The electronic device of claim 11 , wherein the first avatar facial expression data and the second avatar facial expression data each comprise a set of coefficients for each of a plurality of reference three-dimensional (3D) meshes for modeling a facial expression of the first avatar animation and the second avatar animation, respectively. 14. The electronic device of claim 11 , wherein: the second avatar facial expression data comprises lip sync data, and the lip sync data is obtained using an artificial intelligence (AI) model. 15. The electronic device of claim 14 , wherein the AI model is trained using data normalized based on an available range according to the lip sync data. 16. The electro

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • driven by audio data · CPC title

  • Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title

  • G10L21/10Primary

    Transforming into visible information · CPC title

  • Segmentation; Word boundary detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12450809B2 cover?
A method of providing an avatar service includes obtaining a user-uttered voice and a spatial information of a user-utterance space, transmitting the user-uttered voice and the spatial information to a server, receiving, from the server, a first avatar voice answer and an avatar facial expression sequence corresponding to the first avatar voice, which are determined based on the user-uttered vo…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L21/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).