Generating a visually consistent alternative audio for redubbing visual speech

US9922665B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9922665-B2
Application numberUS-201514820410-A
CountryUS
Kind codeB2
Filing dateAug 6, 2015
Priority dateAug 6, 2015
Publication dateMar 20, 2018
Grant dateMar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There are provided systems and methods for generating a visually consistent alternative audio for redubbing visual speech using a processor configured to sample a dynamic viseme sequence corresponding to a given utterance by a speaker in a video, identify a plurality of phonemes corresponding to the dynamic viseme sequence, construct a graph of the plurality of phonemes that synchronize with a sequence of lip movements of a mouth of the speaker in the dynamic viseme sequence, use the graph to generate an alternative phrase that substantially matches the sequence of lip movements of the mouth of the speaker in the video.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for redubbing of a video, the system comprising: a display; an audio speaker; a memory for storing a redubbing application; and a processor configured to execute the reducing application to: sample a dynamic viseme sequence corresponding to an original phrase uttered by a speaking character having a sequence of original lip movements of a mouth in the video; identify, using the sampled dynamic viseme sequence, a plurality of phonemes corresponding to the sampled dynamic viseme sequence; construct a graph of the plurality of phonemes corresponding to the sampled dynamic viseme sequence; generate, using the graph of the plurality of phonemes, a first set of words including al least one word that substantially matches the sequence of the original lip movements of the mouth of the speaking character in the video; construct a second set of phrases, using the first set of words, each of the second set of phrases being an alternative phrase to the original phrase; score each of the second set of phrases based on how closely each of the second set of phrases matches the sequence of lip movements of the mouth of the speaking character in the video; select, based on the score, one of the second set of phrases as the alternative phrase to the original phrase, the alternative phrase formed by the at least one word of the first set of words substantially matching the sequence of the original lip movements of the mouth of the speaking character in the video; and display the sequence of the original lip movements of the mouth in the video on the display in synchronization with playing the at least one alternative phrase via the audio speaker. 2. The system of claim 1 , wherein the first set includes valid words in a target language. 3. The system of claim 1 , wherein the second set includes valid sentences in a target language. 4. The system of claim 3 , wherein the target language is a different language than an original language of the video. 5. The system of claim 1 , wherein the processor is further configured to: select a candidate alternative phrase from the second set; and insert the candidate alternative phrase as a substitute audio for the sampled dynamic viseme sequence. 6. The system of claim 1 , wherein the first set is a complete set including every phoneme that corresponds to the sequence of dynamic visemes. 7. A system for redubbing of a video, the system comprising: a display; an audio speaker; a memory for storing a redubbing application; and a processor configured to execute the reducing application to: sample a dynamic viseme sequence corresponding to a given utterance by a speaking character in the video; identify a plurality of phonemes corresponding to the dynamic viseme sequence; construct a graph of the plurality of phonemes corresponding to the dynamic viseme sequence; generate, using the graph of the plurality of phonemes, a plurality of words that substantially match a sequence of lip movements of a mouth of the speaking character in the video; construct a plurality of alternative phrases, each of the plurality of alternative phrases is formed by one or more of the plurality of words substantially matching the sequence of lip movements of the mouth of the speaking character in the video; score each alternative phrase of the plurality of alternative phrases based on how closely each alternative phrase matches the sequence of lip movements of the mouth of the speaking character in the video; rank the plurality of alternative phrases based on the score; and display the sequence of lip movements of the mouth in the video on the display in synchronization with playing one of the plurality of alternative phrases via the audio speaker based on ranking. 8. A system for redubbing of a video, the system comprising: a user interface; a display; an audio speaker; a memory for storing a redubbing application; and a processor configured to execute the reducing application to: sample a dynamic viseme sequence corresponding to a given utterance by a speaking character in the video; identify a plurality of phonemes corresponding to the dynamic viseme sequence; construct a graph of the plurality of phonemes corresponding to the dynamic viseme sequence; receive, from a user via the user interface, a suggested alternative phrase; transcribe the suggested alternative phrase into an ordered phoneme list; compare, using the graph, the ordered phoneme list to the dynamic viseme sequence; score how well the suggested alternative phrase matches the lip movements of the mouth of the speaking character in the video corresponding to the dynamic viseme sequence; and display the sequence of lip movements of the mouth in the video on the display in synchronization with playing the suggested alternative phrase via the audio speaker based on scoring. 9. The system of claim 8 , wherein the processor is further configured to: suggest a synonym of a word in the alternative phrase, wherein replacing the word in the alternative phrase with the synonym will increase the score. 10. A method for use by a system having a display, an audio speaker, a memory and a processor for redubbing of a video, the method comprising: sampling, using the processor, a dynamic viseme sequence corresponding to an original phrase uttered by a speaking character having a sequence of original lip movements of a mouth in the video; identifying, using the processor and the sampled dynamic viseme sequence, a plurality of phonemes corresponding to the sampled dynamic viseme sequence; constructing, using the processor, a graph of the plurality of phonemes corresponding to the sampled dynamic viseme sequence; generating, using the processor and the graph of the plurality of phonemes, a first set of words including at least one word that substantially matches the sequence of the original lip movements of the mouth of the speaking character in the video; constructing, using the processor, a second set of phrases, using the first set of words, each of the second set of phrases being an alternative phrase to the original phrase; scoring, using the processor, each of the second set of phrases based on how closely each of the second set of phrases matches the sequence of lip movements of the mouth of the speaking character in the video; selecting, using the processor and based on the score, one of the second set of phrases as the alternative phrase to the original phrase, the alternative phrase formed by the at least one word of the first set of words substantially matching the sequence of the original lip movements of the mouth of the speaking character in the video; and displaying, using the processor, the sequence of the original lip movements of the mouth in the video on the display in synchronization with playing the at least one alternative phrase via the audio speaker. 11. The method of claim 10 , wherein the first set includes valid words in a target language. 12. The method of claim 10 , wherein the second set includes valid sentences in a target language. 13. The method of claim 12 , wherein the target language is a different language than an original language of the video. 14. The method of claim 10 , wherein the second set includes a plurality of alternative phrases, the method further comprising: selecting, using the processor, a candidate alternative phrase from the second set; and inserting, using the processor, the candidate alternative phrase as a substitute audio for the sampled dynamic viseme sequence. 15. The method of claim 10 , wherein the first set i

Assignees

Inventors

Classifications

  • Transforming into visible information · CPC title

  • G10L25/57Primary

    for processing of video signals · CPC title

  • Synthesis of the lips movements from speech, e.g. for talking heads · CPC title

  • for synchronising with other signals, e.g. video signals · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9922665B2 cover?
There are provided systems and methods for generating a visually consistent alternative audio for redubbing visual speech using a processor configured to sample a dynamic viseme sequence corresponding to a given utterance by a speaker in a video, identify a plurality of phonemes corresponding to the dynamic viseme sequence, construct a graph of the plurality of phonemes that synchronize with a …
Who is the assignee on this patent?
Disney Entpr Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/57. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).