Generating a Visually Consistent Alternative Audio for Redubbing Visual Speech

US2017040017A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017040017-A1
Application numberUS-201514820410-A
CountryUS
Kind codeA1
Filing dateAug 6, 2015
Priority dateAug 6, 2015
Publication dateFeb 9, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There are provided systems and methods for generating a visually consistent alternative audio for redubbing visual speech using a processor configured to sample a dynamic viseme sequence corresponding to a given utterance by a speaker in a video, identify a plurality of phonemes corresponding to the dynamic viseme sequence, construct a graph of the plurality of phonemes that synchronize with a sequence of lip movements of a mouth of the speaker in the dynamic viseme sequence, use the graph to generate an alternative phrase that substantially matches the sequence of lip movements of the mouth of the speaker in the video.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for redubbing of a video, the system comprising: a memory for storing a redubbing application; a processor configured to execute the reducing application to: sample a dynamic viseme sequence corresponding to a given utterance by a speaker in the video; identify a plurality of phonemes corresponding to the dynamic viseme sequence; construct a graph of the plurality of phonemes corresponding to the dynamic viseme sequence; generate, using the graph of the plurality of phonemes, a first set including at least one word that substantially matches a sequence of lip movements of a mouth of the speaker in the video; and construct a second set including at least one alternative phrase, the at least one alternative phrase formed by the at least one word of the first set that substantially matches the sequence of lip movements of the mouth of the speaker in the video. 2 . The system of claim 1 , further comprising a display, wherein the processor is further configured to display the video synchronized with a candidate alternative phrase from the second set to replace an original audio of the video. 3 . The system of claim 1 , wherein the first set includes valid words in a target language. 4 . The system of claim 1 , wherein the second set includes valid sentences in a target language. 5 . The system of claim 4 , wherein the target language is a different language than an original language of the video. 6 . The system of claim 1 , wherein the processor is further configured to: select a candidate alternative phrase from the second set; and insert the candidate alternative phrase as a substitute audio for the dynamic viseme sequence. 7 . The system of claim 1 , wherein the processor is further configured to: score each alternative phrase of the plurality of alternative phrases in the second set based on how closely each alternative phrase matches the sequence of lip movements of the mouth of the speaker in the video; and rank the alternative phrases based on the score. 8 . The system of claim 1 , further comprising a user interface, wherein the processor is further configured to: receive, from a user via the user interface, a suggested alternative phrase; transcribe the suggested alternative phrase into an ordered phoneme list; compare the ordered phoneme list to the dynamic viseme sequence; and score how well the suggested alternative phrase matches the lip movements of the mouth of the speaker in the video corresponding to the dynamic viseme sequence. 9 . The system of claim 8 , wherein the processor is further configured to: suggest a synonym of a word in the alternative phrase, wherein replacing the word in the alternative phrase with the synonym will increase the score. 10 . The system of claim 1 , wherein the first set is a complete set including every phoneme that corresponds to the sequence of dynamic visemes. 11 . A method for use by a system having a memory and a processor for redubbing of a video, the method comprising: sampling, using the processor, a dynamic viseme sequence corresponding to a given utterance by a speaker in the video; identifying, using the processor, a plurality of phonemes corresponding to the dynamic viseme sequence; constructing, using the processor, a graph of the plurality of phonemes corresponding to the dynamic viseme sequence; generating, using the processor, a first set including at least one word that substantially matches a sequence of lip movements of a mouth of the speaker in the video using the graph of the plurality of phonemes; and constructing, using the processor, a second set including at least one alternative phrase, the at least one alternative phrase formed by the at least one word of the first set that substantially matches the sequence of lip movements of the mouth of the speaker in the video. 12 . The method of claim 11 , wherein the system further comprises a display, the method further comprising: displaying the video synchronized with an alternative phrase from the second set to replace an original audio of the video on the display. 13 . The method of claim 11 , wherein the first set includes valid words in a target language. 14 . The method of claim 11 , wherein the second set includes valid sentences in a target language. 15 . The method of claim 14 , wherein the target language is a different language than an original language of the video. 16 . The method of claim 11 , wherein the second set includes a plurality of alternative phrases, the method further comprising: selecting, using the processor, a candidate alternative phrase from the second set; and inserting, using the processor, the candidate alternative phrase as a substitute audio for the dynamic viseme sequence. 17 . The method of claim 11 , wherein the second set includes a plurality of alternative phrases, the method further comprising: scoring, using the processor, each alternative phrase of the plurality of alternative phrases in the second set; and ranking, using the processor, each alternative phrase of the plurality of alternative phrases in the second set according to how well the pronounced phonemes of each alternative phrase of the plurality of alternative phrases match the dynamic viseme sequence. 18 . The method of claim 11 , wherein the system includes a user interface, the method further comprising: receiving, from a user via the user interface, a suggested alternative phrase; transcribing, using the processor, the suggested alternative phrase into an ordered phoneme list; comparing, using the processor, the ordered phoneme list to the dynamic viseme sequence; and scoring, using the processor, how well the suggested alternative phrase matches the lip movements of the mouth of the speaker in the video corresponding to the dynamic viseme sequence. 19 . The method of claim 18 , further comprising: suggesting, using the processor, a synonym of a word in the suggested alternative phrase, wherein replacing the word of the suggested alternative phrase with the synonym will increase the score. 20 . The method of claim 11 , wherein the first set is a complete set including every phoneme that corresponds to the sequence of dynamic visemes.

Assignees

Inventors

Classifications

  • G10L25/57Primary

    for processing of video signals · CPC title

  • Transforming into visible information · CPC title

  • Synthesis of the lips movements from speech, e.g. for talking heads · CPC title

  • for synchronising with other signals, e.g. video signals · CPC title

  • using position of the lips, movement of the lips or face analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017040017A1 cover?
There are provided systems and methods for generating a visually consistent alternative audio for redubbing visual speech using a processor configured to sample a dynamic viseme sequence corresponding to a given utterance by a speaker in a video, identify a plurality of phonemes corresponding to the dynamic viseme sequence, construct a graph of the plurality of phonemes that synchronize with a …
Who is the assignee on this patent?
Disney Entpr Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/57. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 09 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).