Computer generated emulation of a subject

US2015052084A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2015052084-A1
Application numberUS-201414458556-A
CountryUS
Kind codeA1
Filing dateAug 13, 2014
Priority dateAug 16, 2013
Publication dateFeb 19, 2015
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject's face and voice; said system comprising a processor, a user interface and a personality storage section, the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject's face and output speech from the mouth of the face with the subject's voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user, the processor comprising a dialogue section and a talking head generation section, wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject, and said talking head generation section is configured to: convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject's face and said speech vector comprising a plurality of parameters which define the subject's voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk.

First claim

Opening claim text (preview).

1 . A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject's face and voice; said system comprising a processor, a user interface and a personality storage section, the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject's face and output speech from the mouth of the face with the subject's voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user, the processor comprising a dialogue section and a talking head generation section, wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject, and said talking head generation section is configured to: convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject's face and said speech vector comprising a plurality of parameters which define the subject's voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk. 2 . A system according to claim 1 , wherein the content created by or about the subject comprises posts collected from social media websites, e-mails and other content from or about the subject which has been provided to the personality storage section. 3 . A system according to claim 1 , wherein the dialogue section is configured to navigate a set of rules stored in said personality storage section to generate the response. 4 . A system according to claim 1 , wherein the dialogue section is configured to retrieve a response from said personality storage section by searching information which has been stored in said personality storage section in an unstructured form. 5 . A system according to claim 4 , wherein said dialogue section is configured to search said information stored in a non-hierarchical form using a word-vector or n-gram search model. 6 . A system according to claim 1 , wherein the dialogue section is configured to interpret said query and based on said interpretation select to generate said response using a set of rules stored in said personality storage section or by searching information stored in an unstructured form. 7 . A system for creating a response to an inputted user query, said system comprising: a personality file storage section, said personality file storage section comprising a plurality of documents stored in an unstructured form; a query conversion section configured to convert said query into a word vector; a first comparison section configured to compare said word vector generated from said query with word vectors generated from the documents in said personality file storage section and output identified documents; a second comparison section configured to compare said word vector selected from said query and passages from said identified documents and to rank said selected passages, said ranking being based on the number of matches between said selected passage and said query; and a concatenation section adapted to concatenate selected passages together using sentence connectors, wherein said sentence connectors are chosen from a plurality of sentence connectors, said sentence connectors being chosen on the basis of a statistical model. 8 . A system according to claim 7 , wherein the said ranking is based on a normalised measure of the number of matches between said selected passage and said query. 9 . A system according to claim 7 , wherein said sentence connectors are chosen using a language model. 10 . A system according to claim 7 , wherein the system is configured to set a predetermined size for the response. 11 . A system according to claim 1 , configured to output an expressive response such that said face and voice demonstrate expression, said processor further comprising an expression deriving section configured to determine the expression with which to output the generated response, and wherein the said model parameters describe probability distributions which relate an acoustic unit to an image vector and speech vector for an associated expression. 12 . A system according to claim 11 , wherein the model parameter in each probability distribution in said associated expression is expressed as a weighted sum of parameters of the same type, and wherein the weighting used is expression dependent, such that converting said sequence of acoustic units to a sequence of image vectors comprises retrieving the expression dependent weights for said selected expression. 13 . A system according to claim 12 , wherein the parameters are provided in clusters and each cluster comprises at least one sub-cluster, wherein said expression dependent weights are retrieved for each cluster such that there is one weight per sub-cluster. 14 . A system according to claim 11 , wherein said expression deriving section is configured to extract expressive features from said response to form an expressive linguistic feature vector constructed in a first space and map said expressive linguistic feature vector to an expressive synthesis feature vector that is constructed in a second space, said expressive linguistic feature vector being related to the model parameters of said acoustical model. 15 . A system according to claim 14 , wherein said expression deriving section is configured to extract expressive features from said response to form an expressive linguistic feature vector constructed in a first space and map said expressive linguistic feature vector to the said expression dependent weights. 16 . A system according to claim 1 , wherein said image vector comprises parameters which allow the face to be constructed from a weighted sum of modes using weighting parameters, and wherein the modes represent reconstructions of a face or part thereof. 17 . A system according to claim 16 , wherein the modes comprise modes to represent shape and appearance of the face. 18 . A system according to claim 16 , wherein the same weighting parameter is used for a shape mode and its corresponding appearance mode. 19 . A system for generating a personality file, said personality file being used to store information relating to the speech, face and dialogue intelligence of a subject such that the subject can be emulated using a system in accordance with claim 1 , said personality file being stored in said personality storage section, the system for generating a personality file comprising: an interface for inputting information identifying content created by or about the subject; an audio-visual recording system configured to record the voice and face of a subject, when reading known text, while using a range of different emotions; and a processor being configured to: curate said information id

Assignees

Inventors

Classifications

  • by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title

  • G06N3/006Primary

    based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Detection of language · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2015052084A1 cover?
A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject's face and voice; said system comprising a processor, a user interface and a personality storage section, the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject's face and output speech from the mouth of th…
Who is the assignee on this patent?
Toshiba Kk
What technology area does this patent fall under?
Primary CPC classification G06N3/006. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 19 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).