Automatic dubbing method and apparatus

US11514885B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11514885-B2
Application numberUS-201616342416-A
CountryUS
Kind codeB2
Filing dateNov 21, 2016
Priority dateNov 21, 2016
Publication dateNov 29, 2022
Grant dateNov 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content ( 504 ); obtaining a voice print model for the extracted speeches of the voice ( 506 ); processing the extracted speeches by utilizing the voice print model to generate replacement speeches ( 508 ); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content ( 510 ).

First claim

Opening claim text (preview).

The invention claimed is: 1. An automatic dubbing method, comprising: extracting speeches of a first voice from an audio portion of a media content; receiving an audio input of a second voice of a user of a user device; extracting speeches of the second voice from the audio input including a set of phonemes of the second voice; generating a voice print model for the extracted speeches of the second voice; receiving a selection of the media content for playback on the user device by the user of the user device; responsive to receiving the selection of the media content for playback and during playback of the media content on the user device; processing the extracted speeches of the first voice by utilizing the voice print model to generate replacement speeches, the replacement speeches generated using the set of phonemes of the second voice; replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content; and outputting the audio portion with the generated replacement speeches for playing on the user device. 2. The method of claim 1 , wherein the generating the voice print model further comprises: sampling speeches of the user by using a speech capturing device and creating the voice print model based on the sampled speeches of the user. 3. The method of claim 1 , wherein the generating the voice print model further comprises: creating the voice print model for the voice based further on at least one of a closed caption, a subtitle, a script, a transcript, and a lyric of the media content. 4. The method of claim 1 , further comprises: determining positional data of speeches in the audio portion based on predefined speaker locations for the audio portion and a virtual microphone array. 5. The method of claim 1 , wherein the extracting speeches of the first voice comprises: detecting the speeches from the audio portion of the media content based on a plurality of audio versions in different languages; or detecting the speeches from the audio portion of the media content based on a plurality of audio channels and positional data obtained from the audio portion; or detecting the speeches from the audio portion of the media content based on predefined speaker locations and a virtual microphone array. 6. The method of claim 1 , wherein the processing the extracted speeches further comprises: translating the extracted speeches of the first voice in a first language to the replacement speeches in a second language by utilizing the voice print model. 7. The method of claim 6 , wherein the translating further comprises: generating the translated replacement speeches by further utilizing characteristics of the extracted speeches of the first voice, wherein the characteristics includes at least one of a stress, a tonality, a speed, a volume and an inflection of the speeches. 8. The method of claim 7 , wherein the translating further comprises: performing speech-to-text conversion for the extracted speeches of the first voice based on at least one of a closed caption, a subtitle, a script, a transcript and a lyric of the media content; and/or performing text-to-text translation for text converted from the first language to the second language based on at least one of the characteristics of the speeches, a genre information of the media content, a scene knowledge; and generating the translated replacement speeches for the first voice by performing text-to-speech conversion for the translated text based on the voice print model and the characteristics of the extracted speeches. 9. The method of claim 1 , wherein the extracting speeches of the first voice comprises: grouping the speeches to be associated with the first voice based on at least one of: voice characteristic of the speeches, audio positional data, detection of visual scene transition, visual recognition of speaker, subtitles, and closed captions. 10. The method of claim 1 , wherein the replacing comprises: muting the speeches of the first voice from the audio portion; and adding the replacement speeches in place of the muted speeches in the audio portion. 11. The method of claim 10 , wherein the muting comprises: muting the speeches of the first voice by utilizing the extracted speeches from the audio portion; or muting the speeches of the voice by utilizing a plurality of audio channels obtained from the audio portion based on positional data; or regenerating speeches for the voice based on the voice print model of the voice and positional data, and muting the speeches based on the regenerated speeches. 12. An automatic dubbing apparatus, comprising: a speech extracting module configured to extract speeches of a first voice from an audio portion of a media content, receive an audio input of a second voice of a user of a user device and configured to extract speeches of the second voice from the audio input including a set of phonemes of the second voice; a voice print model obtaining module configured to generate a voice print model for the extracted speeches of the second voice; and a speech processing module configured to, responsive to receiving a selection of the media content for playback on the user device and during playback of the media content on the user device: process the extracted speeches of the first voice by utilizing the voice print model to generate replacement speeches, the replacement speeches generated using the set of phonemes of the second voice; replace the extracted speeches of the first voice with the generated replacement speeches of the second voice in the audio portion of the media content; and output the audio portion with the generated replacement speeches for playing on the user device. 13. The apparatus of claim 12 , wherein the speech extracting module is further configured to: detect the speeches from the audio portion of the media content based on a plurality of audio versions in different languages; or detect the speeches from the audio portion of the media content based on a plurality of audio channels and positional data obtained from the audio portion; or detect the speeches from the audio portion of the media content based on predefined speaker locations and a virtual microphone array. 14. The apparatus of claim 12 , wherein the speech extracting module is further configured to: group the speeches to be associated with the first voice based on at least one of: voice characteristic of the speeches, audio positional data, detection of visual scene transition, visual recognition of speaker, subtitles, and closed captions. 15. The apparatus of claim 12 , wherein the voice print model obtaining module is further configured to: create the voice print model based on speeches of the user, which are sampled by using a speech capturing device. 16. The apparatus of claim 15 , wherein the voice print model obtaining module is further configured to: create the voice print model for the first voice based on the extracted speeches of the first voice and at least one of a closed caption, a subtitle, a script, a transcript, and a lyric of the media content. 17. The apparatus of one of claim 12 , wherein the speech processing module is further configured to: translate the extracted speeches of the first voice in a first language to the replacement speeches in a second language by utilizing the voice print model. 18. The apparatus of claim 17 , wherein the speech processing module is further configured to: generate the translated replacement speeches

Assignees

Inventors

Classifications

  • Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Detection of discrete points within a voice signal · CPC title

  • G10L13/00Primary

    Speech synthesis; Text to speech systems · CPC title

  • Speaker identification or verification techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11514885B2 cover?
An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content ( 504 ); obtaining a voice print model for the extracted speeches of the voice ( 506 ); processing the extracted speeches by utilizing the voice print model to generate replacement speeches ( 508 ); and replacing the extracted speeches of the voice with the gen…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L13/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).