What technology area does this patent fall under?

Primary CPC classification G06N3/006. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Computer generated emulation of a subject

US11144597B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11144597-B2
Application number	US-201815923566-A
Country	US
Kind code	B2
Filing date	Mar 16, 2018
Priority date	Aug 16, 2013
Publication date	Oct 12, 2021
Grant date	Oct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject's face and voice; said system comprising a processor, a user interface and a personality storage section, the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject's face and output speech from the mouth of the face with the subject's voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user, the processor comprising a dialogue section and a talking head generation section, wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject, and said talking head generation section is configured to: convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject's face and said speech vector comprising a plurality of parameters which define the subject's voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for emulating a subject, to allow a user to interact with a computer-generated talking head with a face and a voice of the subject, said system comprising: processing circuitry, a user interface, and a personality memory, the user interface being configured to emulate the subject, by displaying a talking head, which comprises the face of the subject, and output speech from the mouth of the face with the voice of the subject, the user interface further comprising a receiver to receive a query from the user, the emulated subject being configured to respond to the query received from the user, wherein the processing circuitry is configured to generate a response to the query inputted by the user from the user interface, the response to be outputted by the talking head, the response being generated by retrieving information from said personality memory, said personality memory storing content created by or about the subject, the response being an expressive response such that the face and the voice demonstrate expression, determine the expression with which to output the generated response, convert said response into a sequence of acoustic units using a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality memory, the model parameters describing probability distributions that relate an acoustic unit to an image vector and a speech vector for an associated expression, said image vector comprising a plurality of parameters that define the face of the subject, and said speech vector comprising a plurality of parameters that define the voice of the subject, and output a sequence of speech vectors and image vectors, which are synchronized such that the head appears to talk. 2. The system according to claim 1 , wherein the content created by or about the subject comprises posts collected from social media websites, e-mails, and other content from or about the subject that has been provided to the personality memory. 3. The system according to claim 1 , wherein the processing circuitry is further configured to navigate a set of rules stored in said personality memory to generate the response. 4. The system according to claim 1 , wherein the processing circuitry is further configured to retrieve a response from said personality memory by searching information which that has been stored in said personality memory in an unstructured form. 5. The system according to claim 4 , wherein the processing circuitry is further configured to search said information stored in a non-hierarchical form using a word-vector or an n-gram search model. 6. The system according to claim 1 , wherein the processing circuitry is further configured to interpret said query, and based on said interpretation, generate said response using a set of rules stored in said personality memory or by searching information stored in an unstructured form. 7. The system according to claim 1 , wherein the model parameter in each probability distribution in said associated expression is expressed as a weighted sum of parameters of the same type, and wherein the weighting used is expression dependent, such that converting said sequence of acoustic units to a sequence of image vectors by the processing circuitry comprises retrieving the expression dependent weights for said selected expression. 8. The system according to claim 7 , wherein the parameters are provided in clusters and each cluster comprises at least one sub-cluster, and wherein said expression dependent weights are retrieved by the processing circuitry for each cluster such that there is one weight per sub-cluster. 9. The system according to claim 7 , wherein the processing circuitry is further configured to extract expressive features from said response to form an expressive linguistic feature vector constructed in a first space, and map said expressive linguistic feature vector to an expressive synthesis feature vector that is constructed in a second space, said expressive linguistic feature vector being related to the model parameters of said statistical model. 10. The system according to claim 9 , wherein the processing circuitry is further configured to extract the expressive features from said response to form the expressive linguistic feature vector constructed in the first space, and map said expressive linguistic feature vector to the said expression dependent weights. 11. The system according to claim 1 , wherein said image vector comprises parameters that allow the face to be constructed by the processing circuitry from a weighted sum of modes using weighting parameters, and wherein the modes represent reconstructions of the face or a part thereof. 12. The system according to claim 11 , wherein the modes comprise modes to represent shape and appearance of the face. 13. The system according to claim 11 , wherein a same weighting parameter is used by the processing circuitry for a shape mode and a corresponding appearance mode. 14. A system for generating a personality file, said personality file being used to store information relating to the speech, the face and dialogue intelligence of the subject such that the subject can be emulated using the system for emulating the subject of claim 1 , said personality file being stored in said personality memory, the system for generating a personality file comprising: a particular interface for the system for generating the particular personality file, the particular interface inputting information identifying content created by or about the subject; an audio-visual recording system configured to record the voice and the face of the subject, when reading known text, while using a range of different emotions; and circuitry configured to: curate said information identifying content created by or about said user, said curation comprising organizing said content into documents and building an n-gram language model for said documents, and a word vector model for each document; and produce said statistical model, said statistical model comprising the plurality of model parameters describing probability distributions that relate an acoustic unit to an image vector and the speech vector, said image vector comprising the plurality of parameters that define the face of the subject and said speech vector comprising a plurality of parameters that define the voice of the subject, the circuitry being further configured to train said statistical model such that a sequence of speech vectors and image vectors, which are synchronized when outputted, cause the generated head to appear to talk. 15. A method for emulating a subject, to allow a user to interact with a computer-generated talking head with a face and a voice of the subject, the method comprising: receiving a user inputted query; generating a response to the query inputted by a user from a user interface, the response to be outputted by the talking head, the response being generated by retrieving information from a personality memory, said personality memory storing content created by or about the subject, the response being an expressive response such that the face and the voice demonstrate expression; and outputting said response by displaying a talking head that comprises the face of the subject, and output speech from the mouth of the face with the voice of the subject, wherein said talking head outputs said response by converting said response into a sequence of acoustic units using a statistical model, said statistical model comprising a plurality of model parameters,

Assignees

Toshiba Kk

Inventors

Classifications

G06F18/2113
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title
G06N3/006Primary
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
G06V40/20
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
G06T13/40Primary
of characters, e.g. humans, animals or virtual beings · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 49301825

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11144597B2 cover?: A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject's face and voice; said system comprising a processor, a user interface and a personality storage section, the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject's face and output speech from the mouth of th…
Who is the assignee on this patent?: Toshiba Kk
What technology area does this patent fall under?: Primary CPC classification G06N3/006. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).