What technology area does this patent fall under?

Primary CPC classification G10L25/57. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Generating a visually consistent alternative audio for redubbing visual speech

US9922665B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9922665-B2
Application number	US-201514820410-A
Country	US
Kind code	B2
Filing date	Aug 6, 2015
Priority date	Aug 6, 2015
Publication date	Mar 20, 2018
Grant date	Mar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There are provided systems and methods for generating a visually consistent alternative audio for redubbing visual speech using a processor configured to sample a dynamic viseme sequence corresponding to a given utterance by a speaker in a video, identify a plurality of phonemes corresponding to the dynamic viseme sequence, construct a graph of the plurality of phonemes that synchronize with a sequence of lip movements of a mouth of the speaker in the dynamic viseme sequence, use the graph to generate an alternative phrase that substantially matches the sequence of lip movements of the mouth of the speaker in the video.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for redubbing of a video, the system comprising: a display; an audio speaker; a memory for storing a redubbing application; and a processor configured to execute the reducing application to: sample a dynamic viseme sequence corresponding to an original phrase uttered by a speaking character having a sequence of original lip movements of a mouth in the video; identify, using the sampled dynamic viseme sequence, a plurality of phonemes corresponding to the sampled dynamic viseme sequence; construct a graph of the plurality of phonemes corresponding to the sampled dynamic viseme sequence; generate, using the graph of the plurality of phonemes, a first set of words including al least one word that substantially matches the sequence of the original lip movements of the mouth of the speaking character in the video; construct a second set of phrases, using the first set of words, each of the second set of phrases being an alternative phrase to the original phrase; score each of the second set of phrases based on how closely each of the second set of phrases matches the sequence of lip movements of the mouth of the speaking character in the video; select, based on the score, one of the second set of phrases as the alternative phrase to the original phrase, the alternative phrase formed by the at least one word of the first set of words substantially matching the sequence of the original lip movements of the mouth of the speaking character in the video; and display the sequence of the original lip movements of the mouth in the video on the display in synchronization with playing the at least one alternative phrase via the audio speaker. 2. The system of claim 1 , wherein the first set includes valid words in a target language. 3. The system of claim 1 , wherein the second set includes valid sentences in a target language. 4. The system of claim 3 , wherein the target language is a different language than an original language of the video. 5. The system of claim 1 , wherein the processor is further configured to: select a candidate alternative phrase from the second set; and insert the candidate alternative phrase as a substitute audio for the sampled dynamic viseme sequence. 6. The system of claim 1 , wherein the first set is a complete set including every phoneme that corresponds to the sequence of dynamic visemes. 7. A system for redubbing of a video, the system comprising: a display; an audio speaker; a memory for storing a redubbing application; and a processor configured to execute the reducing application to: sample a dynamic viseme sequence corresponding to a given utterance by a speaking character in the video; identify a plurality of phonemes corresponding to the dynamic viseme sequence; construct a graph of the plurality of phonemes corresponding to the dynamic viseme sequence; generate, using the graph of the plurality of phonemes, a plurality of words that substantially match a sequence of lip movements of a mouth of the speaking character in the video; construct a plurality of alternative phrases, each of the plurality of alternative phrases is formed by one or more of the plurality of words substantially matching the sequence of lip movements of the mouth of the speaking character in the video; score each alternative phrase of the plurality of alternative phrases based on how closely each alternative phrase matches the sequence of lip movements of the mouth of the speaking character in the video; rank the plurality of alternative phrases based on the score; and display the sequence of lip movements of the mouth in the video on the display in synchronization with playing one of the plurality of alternative phrases via the audio speaker based on ranking. 8. A system for redubbing of a video, the system comprising: a user interface; a display; an audio speaker; a memory for storing a redubbing application; and a processor configured to execute the reducing application to: sample a dynamic viseme sequence corresponding to a given utterance by a speaking character in the video; identify a plurality of phonemes corresponding to the dynamic viseme sequence; construct a graph of the plurality of phonemes corresponding to the dynamic viseme sequence; receive, from a user via the user interface, a suggested alternative phrase; transcribe the suggested alternative phrase into an ordered phoneme list; compare, using the graph, the ordered phoneme list to the dynamic viseme sequence; score how well the suggested alternative phrase matches the lip movements of the mouth of the speaking character in the video corresponding to the dynamic viseme sequence; and display the sequence of lip movements of the mouth in the video on the display in synchronization with playing the suggested alternative phrase via the audio speaker based on scoring. 9. The system of claim 8 , wherein the processor is further configured to: suggest a synonym of a word in the alternative phrase, wherein replacing the word in the alternative phrase with the synonym will increase the score. 10. A method for use by a system having a display, an audio speaker, a memory and a processor for redubbing of a video, the method comprising: sampling, using the processor, a dynamic viseme sequence corresponding to an original phrase uttered by a speaking character having a sequence of original lip movements of a mouth in the video; identifying, using the processor and the sampled dynamic viseme sequence, a plurality of phonemes corresponding to the sampled dynamic viseme sequence; constructing, using the processor, a graph of the plurality of phonemes corresponding to the sampled dynamic viseme sequence; generating, using the processor and the graph of the plurality of phonemes, a first set of words including at least one word that substantially matches the sequence of the original lip movements of the mouth of the speaking character in the video; constructing, using the processor, a second set of phrases, using the first set of words, each of the second set of phrases being an alternative phrase to the original phrase; scoring, using the processor, each of the second set of phrases based on how closely each of the second set of phrases matches the sequence of lip movements of the mouth of the speaking character in the video; selecting, using the processor and based on the score, one of the second set of phrases as the alternative phrase to the original phrase, the alternative phrase formed by the at least one word of the first set of words substantially matching the sequence of the original lip movements of the mouth of the speaking character in the video; and displaying, using the processor, the sequence of the original lip movements of the mouth in the video on the display in synchronization with playing the at least one alternative phrase via the audio speaker. 11. The method of claim 10 , wherein the first set includes valid words in a target language. 12. The method of claim 10 , wherein the second set includes valid sentences in a target language. 13. The method of claim 12 , wherein the target language is a different language than an original language of the video. 14. The method of claim 10 , wherein the second set includes a plurality of alternative phrases, the method further comprising: selecting, using the processor, a candidate alternative phrase from the second set; and inserting, using the processor, the candidate alternative phrase as a substitute audio for the sampled dynamic viseme sequence. 15. The method of claim 10 , wherein the first set i

Assignees

Disney Entpr Inc

Inventors

Classifications

G10L21/10
Transforming into visible information · CPC title
G10L25/57Primary
for processing of video signals · CPC title
G10L2021/105
Synthesis of the lips movements from speech, e.g. for talking heads · CPC title
G10L21/055
for synchronising with other signals, e.g. video signals · CPC title

Patent family

Related publications grouped by family.

View patent family 58052611

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9922665B2 cover?: There are provided systems and methods for generating a visually consistent alternative audio for redubbing visual speech using a processor configured to sample a dynamic viseme sequence corresponding to a given utterance by a speaker in a video, identify a plurality of phonemes corresponding to the dynamic viseme sequence, construct a graph of the plurality of phonemes that synchronize with a …
Who is the assignee on this patent?: Disney Entpr Inc
What technology area does this patent fall under?: Primary CPC classification G10L25/57. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).