Synchronizing virtual actor's performances to a speaker's voice

US9524081B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9524081-B2
Application numberUS-201514687877-A
CountryUS
Kind codeB2
Filing dateApr 15, 2015
Priority dateMay 16, 2012
Publication dateDec 20, 2016
Grant dateDec 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for generating and displaying holographic visual aids associated with a story to an end user of a head-mounted display device while the end user is reading the story or perceiving the story being read aloud is described. The story may be embodied within a reading object (e.g., a book) in which words of the story may be displayed to the end user. The holographic visual aids may include a predefined character animation that is synchronized to a portion of the story corresponding with the character being animated. A reading pace of a portion of the story may be used to control the playback speed of the predefined character animation in real-time such that the character is perceived to be lip-syncing the story being read aloud. In some cases, an existing book without predetermined AR tags may be augmented with holographic visual aids.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for synchronizing performance of a virtual object to performance of a person, comprising: capturing a first audio signal associated with sounds made by the person during a first time period; detecting that a first utterance has been spoken by the person using the captured first audio signal; determining a speed at which the person spoke the first utterance; capturing a second audio signal associated with sounds made by the person during a second time period subsequent to the first time period; detecting that a portion of a second utterance has been spoken by the person using the captured second audio signal; and displaying, using a display, a sequence of images of the virtual object at a rate corresponding with the determined speed of the first utterance in response to detecting that the portion of the second utterance has been spoken by the person, the sequence of images including a sequence of mouth shape images displayed such that the virtual object appears to speak the second utterance at the determined speed of the first utterance. 2. The method of claim 1 , wherein: the capturing a first audio signal includes capturing the first audio signal using a microphone; and the detecting that a first utterance has been spoken by the person is performed by a mobile device, the detecting that a first utterance has been spoken by the person includes applying speech recognition techniques to the captured first audio signal; and the displaying a sequence of images of the virtual object includes displaying the sequence of images using the mobile device. 3. The method of claim 2 , wherein: the mobile device comprises a head-mounted display device. 4. The method of claim 2 , further comprising: identifying a reading object within a field of view of the mobile device, the first utterance corresponds with a first sentence from the reading object. 5. The method of claim 4 , wherein: the reading object comprises a book. 6. The method of claim 4 , wherein: the second utterance comprises a second sentence from the reading object that is spoken by the person. 7. The method of claim 4 , further comprising: detecting a page of the reading object within the field of view of the mobile device; identifying the virtual object based on the page of the reading object; and the displaying the sequence of images includes displaying the sequence of images such that the virtual object appears to be attached to the reading object. 8. The method of claim 1 , wherein: the displaying a sequence of images of the virtual object includes displaying the sequence of mouth shape images prior to the second utterance being completely spoken by the person. 9. The method of claim 1 , further comprising: detecting a failure of the second utterance being completely spoken by the person; and displaying an idling holographic animation in response to detecting the failure of the second phrase being completely spoken by the person. 10. The method of claim 1 , wherein: the detecting that a portion of a second utterance has been spoken by the person includes detecting a sequence of keywords corresponding with the portion of the second utterance. 11. The method of claim 1 , wherein: the determining a speed at which the person spoke the first utterance includes determining an amount of time that the person took to speak a plurality of words corresponding with the first utterance. 12. The method of claim 2 , wherein: the mobile device comprises a see-through head-mounted display device worn by a first person different from the person. 13. An electronic device for synchronizing performance of a virtual object to performance of a person, comprising: one or more processors, the one or more processors detect that a first utterance has been spoken by a person and determine a speed at which the person spoke the first utterance, the one or more processors detect that a portion of a second utterance has been spoken by the person and generate a sequence of images of the virtual object in response to detecting that the portion of the second utterance has been spoken by the person; and a see-through display in communication with the one or more processors, the see-through display displays the sequence of images of the virtual object at a rate corresponding with the speed of the first utterance, the sequence of images including a sequence of mouth shape images that are displayed using the see-through display such that the virtual object appears to speak the second utterance at the speed of the first utterance. 14. The electronic device of claim 13 , wherein: the electronic device comprises a head-mounted display device. 15. The electronic device of claim 13 , wherein: the one or more processors identify a reading object within a field of view of the electronic device, the second utterance comprises a second sentence from the reading object that is spoken by the person. 16. The electronic device of claim 15 , wherein: the reading object comprises a book. 17. The electronic device of claim 13 , wherein: the see-through display displays the sequence of mouth shape images prior to the second utterance being completely spoken by the person. 18. One or more hardware storage devices containing processor readable code for programming one or more processors to perform a method for synchronizing performance of a virtual object to performance of a person, the processor readable code comprising: processor readable code configured to detect that a first utterance has been spoken by the person; processor readable code configured to determine a speed at which the person spoke the first utterance; processor readable code configured to detect that a portion of a second utterance has been spoken by the person; and processor readable code configured to cause a sequence of images of the virtual object to be displayed at a rate corresponding with the speed of the first utterance subsequent to detecting that the portion of the second utterance has been spoken by the person, the sequence of images including a sequence of mouth shape images that are displayed such that the virtual object appears to speak the second utterance at the speed of the first utterance. 19. The one or more hardware storage devices of claim 18 , wherein the processor readable code further comprises: processor readable code configured to detect that the first utterance has been spoken using a head-mounted display device; and processor readable code configured to cause the sequence of images of the virtual object to be displayed using a display of the head-mounted display device. 20. The one or more hardware storage devices of claim 19 , wherein the processor readable code further comprises: processor readable code configured to cause the sequence of images of the virtual object to be displayed using the display of the head-mounted display device such that the sequence of mouth shape images are displayed prior to the second utterance being completely spoken by the person.

Assignees

Inventors

Classifications

  • Combinations of audio and printed presentations, e.g. magnetically striped cards, talking books, magnetic tapes with printed texts thereon · CPC title

  • Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Handheld portable device, e.g. holographic camera, mobile holographic display · CPC title

  • Superimposing the holobject with other visual information · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9524081B2 cover?
A system for generating and displaying holographic visual aids associated with a story to an end user of a head-mounted display device while the end user is reading the story or perceiving the story being read aloud is described. The story may be embodied within a reading object (e.g., a book) in which words of the story may be displayed to the end user. The holographic visual aids may include …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0483. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).